Finding the Right Data Points
Research by Benjamin R. LaFreniere, CCIM
The research I do today would not have been possible a decade ago. New demographic mapping tools, data sources, financial transparency, and recession-wrought investors have created a new frontier in today’s commercial real estate market. The frontier is located at the intersection of technology, commercial real estate fundamentals, and mathematics.
As we move forward, please remember I am not a writer, nor pretending to be a writer. I am a commercial real estate agent with basic writing skills. The MLA Handbook on writing research papers I leave to people much smarter than me.
Millennials across the world are finding new ways of doing business and innovating. I am a millennial. I remember a time before cell phones/internet and having to check out books at the library; however, I graduated college in 2008. So I enjoy picking up the phone as much as I enjoy a succinct text message.
I’ve outlined my research in the sections below. Before diving into the data, let me explain the setting first. I am acting as a tenant representative for a client. The client has four successful locations. His business is a niche sports membership service. The four locations are spread out over the Eastern coast of the United States. My client is on the brink of expanding aggressively throughout the nation and we wanted to determine the best strategy to some way empirically and confidently select locations over a vast region. So we asked the basic question, “How can we be confident a new location will be successful?”
In years past, the aforementioned question was answered somewhat qualitatively with comments such as, “I know Atlanta has some great sales” or “a lot of businesses are opening in Phoenix lately”. However, successful businesses know quantitative answers are now possible. Math shows no bias (unless you are an artful statistician) and the quantitative answer I mentioned is where the new frontier of commercial real estate begins.
Section 1: Data Overload - Garbage in/Garbage Out
To begin, ask your client for two data points from each of his locations. Actual sales and physical address. You will be doing the rest of the heavy lifting. Once you have the actual sales and addresses, put them into the spreadsheet program of your preference and collect as much data as you can.
Incorporate not only demographic data, but weather, business related current event data (think twitter), Google trends, and other data points. For example, if you wanted to have an outdoor ice skating company, make sure your location’s average temperature isn’t above freezing temperature range. Take those average temperatures and input into your spreadsheet program. For this particular client, I studied over 500 variables. Below is a picture of my spreadsheet. It doesn’t show all variables because I could not fit them all into one screen!
Now, with the help of technology (may heaven help you if you do it by hand), run correlation reports on the hundreds of variables you’ve selected. Many programs can do this for you, but make sure you understand one thing about correlation: IT DOES NOT MEAN CAUSATION. The idea here is to critically think about these variables and how they affect the site’s success or lack thereof. Also, keep your thoughts on the macro-level goals (In this example, “How can we be confident a new location will be successful”). Correlation is only a helpful suggestion. Don’t get lost in the forest of data.
After running correlation studies, the correlation figures revealed humans are capable of error! We (my client and I) were looking at the wrong variables to determine successful sites. Below are the actual correlation figures:
(NOTE: Variable 1-4 I’ve replaced the names with to protect my client’s competitive advantage. I have kept the real correlation numbers for your benefit and demonstration purposes. Variable 1 could be anything from HH income to number of twitter mentions)
To explain the numbers above, I’ve shown that Variable 1 was negatively correlated 46.45% to Sales. Should I remove this variable? NO! We keep this variable to help fine tune our findings later. As an aside, how great is it to know that if an area has X of Variable 1, it has a fairly strong correlation to weaker sales!?
Variable 2 has a 92.38% correlation to sales. In other words if there are so many of Variable 2, sales are correlated higher. Variable 2 was particularly surprising to me and my client because we were looking at the wrong demographic data points and after this study we have realigned our way of thinking about a demographic area.
Median Disposable Income (MDI) I kept even though the correlation is not incredibly strong. Common sense will tell you if an area has higher MDI, a good business concept will most likely do well. DUH. For me, MDI gave me a more accurate prediction on what a new location’s sales will be.
Variable 4 was not highly correlated to sales; however, it was a metric fundamental to include. As my data set grows (see end of article for my thoughts on a sample size of four locations), I hypothesize the Variable 4 will become more valuable. Remember what I said above and to keep your thoughts on macro-level goals. These numbers are your tools. Use them as a tool.
Section 2: Polishing the numbers
Now that we know which variables correlate to higher and lower sales, let’s polish the numbers and start making sense of the variables. Below is snapshot of my spreadsheet after distilling down to the 4 variables I mentioned above.
And below is a graphical representation of the above figures. Remember, Variable 1 has a negative correlation.
Above you see a graphical representation of my spreadsheet figures. The figures are real and actual, but slightly disguised for my client’s privacy. A description of each is below.
Sales: represented in $100s. So Location A has ~$1.1M in gross sales. Fairly straight forward and these are actual sales, not estimated.
Variable 1-4: These are metrics I described in greater length above.
Median disposable income (MDI): MDI value is from 2014 and this variable can be found from many, many different sources (and all SHOULD be the same figure, but not always). Strictly for the benefit of this article, I decided to not obfuscate this variable to show a real-life example of a variable I used for my client’s study.
The radius/drivetime constraint will also be kept confidential for my client’s protection and will vary greatly from client to client. For example, a distribution warehouse needs different demographic and business data boundaries than a frozen yogurt location.
If you want to be more confident in your findings, multiple radius/drivetime boundary constraints will also provide great insight into the real estate. For example, a 3 mile radius may fit the criteria you are looking for; however, the 5 mile criteria will not. Or the relationship between the 3 mile and 5 mile radius can give you insight you would have not seen otherwise.
Section 3: The Future and Beyond
Before we go any further, I must disclose that my sample size is not enough to be statistically significant. However, failure is the mother of all success. Will we open a location that shines above the rest? Most likely. Will we open a location that drives the business into the ground? Highly unlikely. And the data is on our side at the moment, but in this industry we take risks. Although it is my fiduciary commitment to my client to manage, understand, and mitigate those risks. Until I have a sample size that is statistically significant, I can not say I am extremely confident in my current variables. But are our approaches better than our past approaches? Yes. Progress!
So now we have variables we can show are correlated to sales. Now what? I leave that answer up to you. For me, at this point in my research I begin to predict sales (using custom algorithms I design) at a potential site location which can then help determine what permitable real estate costs should be. Also you can determine which stores are underperforming and need relocation. Or you can relay information about these variables to the marketing teams to better target better demographics. The applications of finding the right data points are truly a brave new world in commercial real estate.
Nota bene, our role as commercial real estate practitioners is evolving as you read this article. One of our (in the proverbial commercial real estate practitioner sense) values is finding space availability to fit our client’s needs. The aforementioned process is being done with more efficiency by computers. But do not despair, our (proverbial sense again) value is still the same: knowledge. Knowledge to better equip our clients to make not only good, but great business decisions.
Most importantly we are moving away from gut feelings (I do not wish or think ‘gut’ feelings will ever be completely replaced), and more into data-driven decisions. These data driven decisions were not possible a decade age. If practitioners started using these tools to equip their clients, we will create a more confident, smarter, and more efficient business community for everyone involved.
Benjamin R. LaFreniere, CCIM