A Different Approach to Revenue Estimates Leveraging Alternative Data
Explore a fresh approach to revenue forecasting using alternative data, by focusing on the customer journey rather than short-term trends
Welcome to the Data Score newsletter, your go-to source for insights into the world of data-driven decision-making. Whether you're an insight seeker, a unique data company, a software-as-a-service provider, or an investor, this newsletter is for you. I'm Jason DeRise, a seasoned expert in the field of alternative data insights. As one of the first 10 members of UBS Evidence Lab, I was at the forefront of pioneering new ways to generate actionable insights from data. Before that, I successfully built a sell-side equity research franchise based on proprietary data and non-consensus insights. Through my extensive experience as a purchaser and creator of data, I have gained a unique perspective that allows me to collaborate with end-users to generate meaningful insights.
In the realm of investment, successful alpha generation1 can often hinge on the effective use of alternative data to generate an edge. This article offers an approach to leveraging alternative data, transcending the common usage of transaction data for predicting quarterly revenue. Instead, focus on a wider array of datasets that offer insights into customer acquisition and retention, viewed through the prism of a customer acquisition funnel.
Traditional use of credit card data is seeing diminishing returns
The feeling from the majority of investors on the buyside is that it’s getting harder to generate alpha from individual US-focused transaction data providers without more creativity.
Transaction data has historically been the dataset type that investors have relied on to understand the trends in revenue before the companies report. There are multiple credit and debit card panels2 available in the US, as well as receipt panels (e-receipt and physical), which generate a lot of use. The feeling from the majority of investors on the buyside3 is that it’s getting harder to generate alpha from individual US-focused transaction data providers without more creativity.
One approach to generating more alpha would be to combine multiple transaction datasets whose panels complement each other in coverage (different geographic exposure, different consumer cohort exposure), providing more accuracy in forecasting revenue compared to the financial market potentially relying on only one dataset. There is value in this approach of creating a bigger mouse trap, but it is still the same data-use strategy focused on estimating the upcoming quarter’s revenue. Since many investors follow this strategy in terms of investment duration and generating signals using various datasets in isolation, the ability to generate alpha may still be pressured by competing firms using similar strategies from similar transaction data.
Instead of following the traditional strategy around the quarter, I would propose a different approach. While monitoring the point of purchase is important because it’s the critical culmination of any business's ability to monetize their goods or services, there are digital signs throughout the customer journey to understand the underlying trends that are on the path to purchase. Collecting, cleansing, and enriching these digital markers makes it easier to catch inflection points early to get ahead of multiple quarters of results beyond the upcoming quarterly results.
Apply the customer acquisition funnel framework to alt data
Understanding the customer journey from awareness to loyalty is a universal paradigm employed by corporations worldwide to guide decision-making and influence consumer behavior.
Customer acquisition funnel: The customer acquisition funnel is a model that represents the journey a customer goes through from the initial stage of awareness about a product or service to becoming a loyal customer. There various approaches to the funnel with different labels used. However, it typically consists of stages like awareness, engagement, trial, and loyalty.
Corporations around the world use this framework to make decisions about how to influence consumer behavior by growing awareness, stimulating trial, generating repeat usage, and ultimately generating loyal customers. Aligning this paradigm with multiple alternative data types enables us to identify early trend inflection points or weaker conversion areas that could hinder future growth if not addressed by company management. In addition to being able to monitor critical KPIs that corporations care about and improve forecasting ability, the framework provides the ability to ask more targeted questions of management to gauge their future decisions to grow the top of the funnel and improve conversion to loyal customers.
Do you think about it the same way or have a different approach?
Awareness:
The top of the funnel is awareness. Two types of awareness can be measured: unprompted or top-of-mind awareness, and prompted awareness
Top-of-mind awareness: Top-of-mind awareness (or unprompted awareness) refers to the extent to which a brand or product comes to consumers' minds when they think about a particular category or need. It represents the highest level of awareness and reflects the brand's strong positioning in consumers' minds. Unprompted awareness is the more powerful signal of awareness compared to prompted awareness below. Google search data is a great proxy for top-of-mind awareness. Entering text into the blank Google search bar is the moment of truth when a consumer leaves the digital breadcrumb of what they are thinking about without any prior prompt.
Prompted Awareness: A great way to see prompted awareness is through market research, especially via syndicated market research4 providers with highly frequent questions and a long history of asking “Which of these have you heard of before”. All else being equal, the share of top-of-mind awareness is more important than the share of prompted awareness.
As a side note about the potential transformation happening in the web search industry due to the availability of LLMs to the public; perhaps some day we will be monitoring other search engines or LLM searches, so it’s worth watching the overall usage of Google to make sure it's capturing enough of the market. For now, Google still generates the vast majority of information searches. The Google search data insights will remain valid even if overall Google search use is declining compared to competitors (so long as it doesn’t shrink to a biased user base). The reason it will still be valid is that Google Trends makes the search data available as a measure of the terms searched relative to all search activity in the geography selected, and then the value is indexed to the maximum value returned by the query in Google Trends. However, it’s very important that the user of Google Trends appropriately creates terms, extracts the data, considering the data is a sample of the full data, and then cross-relates the values across relevant search terms. I’ll save for another newsletter article a deep dive on how analysts completely misuse Google Trends and get to dangerously wrong conclusions.
Engagement/Trial:
Monitoring the rate of customer product or service trials can be accomplished in several ways, depending on the nature of the customer-product interaction. Whether the product is a digital service, a physical good, or a service, various data sources such as app downloads, clickstream visits5, transaction data, web-mined best-seller shares, and social media interactions can provide valuable insights.
Selecting the right approach depends on the way customers interact with the product in question:
Is the product or service more likely to be digitally connected to the consumer via an app? → App downloads
Is it more likely to be interacted with via desktop? → Clickstream visits
Is it a physical product driven by purchases at physical locations? → Transaction data focused on the number of unique purchasers. Also, foot traffic/geolocation data can be used
Is the product's popularity visible across the web when sold by multi-brand retailers? → Web-mined best-seller share
Are consumers interacting with the brand on social media? → Social media followers, likes, interactions data
Is it a B2B service or product? → In the B2B world, trial and engagement are harder to track, but there are digital clues left by employers seeking experience in managing relationships with specific suppliers or seeking experience using a tech platform.
Aggregated interactions shown in a time series are often enough to gather high-level trends. In some datasets, it's possible to isolate the data to remove the frequency of interactions and purchases to understand the share of the customer population engaged for the first time. However, it’s not always necessary to take this step, as the high-level trends of these metrics are more closely aligned with engagement and trial. Further down the funnel, we will focus on repeat purchases and repeat usage.
Where investors go wrong is by directly relating trial and engagement metrics to revenues. They are indirectly related, and when incorporated with other metrics, they can be predictive; however, it's a step too far to expect high correlations with revenue on its own. The likely low correlation with revenue does not mean we should ignore the data. We know the data is predictive when corporations provide relevant KPIs and commentary that directly relate to these trial and engagement metrics. Unfortunately, not all companies provide the necessary level of detail to backtest the specifically relevant KPIs. Nevertheless, trial and engagement trends are an important factor in understanding the long-term sustainability of revenue trends. A breakdown at this level of the funnel compared to awareness would be a sign of long-term problems in the conversion of customers into sustainable revenue.
Regular user:
Delving deeper into the data, we can discern how initial trials convert into regular usage. Metrics such as daily, weekly, and monthly active users, repeat user percentages, and time spent with digital services provide valuable insights into the customer's transition from trial to becoming a regular user.
Not all datasets have the ability to drill down to the level needed. Where possible, we are looking for daily active users, weekly active users, and monthly active users. If the data allows for the regrouping of anonymized panelists into cohorts based on data of first purchase or use, we can track what percentage are repeat users within the next few weeks and months (depending on the purchase cycle for the product or service). Repeat users can be calculated for credit card data, e-receipt data, app usage data, and clickstream data.
Another way to measure highly engaged usage is through the time spent with digital services via app and clickstream data.
The flip side of repeat usage is churn, which is important to look at as well, especially to understand which competitors are benefiting from the lost customers.
Loyal:
By examining regular users further, we can identify the core customers— those who are most committed to the product or service. Understanding this group is critical, as it forms the primary source of sustainable revenue growth.
Most companies see the same 80/20 rule for demand, where 20% of the customers generate 80% of the revenue. Understanding this group is important because it provides the most important source of sustainable revenue growth. Datasets that reveal this level of devotion require panel-level data, which can be recut and grouped into cohorts by quintiles of spend or usage per period.
The whales (top 20%) and minnows (bottom 20%) analysis6 can reveal the underlying health of the business, as whales are a source of stability when happy with the product but can be a source of pain if they churn. Of the whale cohort, are their purchases concentrated in the company’s products, or are they whales because they spend a lot on the category but switch between brands often? Is the trend of purchases or usage trending higher in the whale cohort or declining? Is the cutoff level for a consumer being in the top 20% and top 40% rising or shrinking?
Turning this whales and minnows analysis on its side and looking at the cohort defined by the category instead of the company, how are the market share trends for the company's brands, products, and services within the top 20% of consumers within the category level? Movement in this segment would disproportionately explain the trends not only in the immediate quarter but for the next four quarters until the consumer shift is annualized into the base comparison.
A made up example to show how the data could be interpreted
To illustrate how this alternative data can be interpreted, let's consider a hypothetical brand's competitive set. We can convert relevant metrics into rankings at each level of the funnel, monitoring for shifts in ranks further down the funnel or significant changes in rank order. Each brand's conversion of awareness to loyalty, as well as changes in ranking, adjusts the probabilities of sustainable or deteriorating revenue growth for both the current quarter and future periods.
In the above section, we discussed specific metrics, but here for the sake of simplicity I’m simplifying the metrics to the high level funnel component as a rank and change in rank.
Three examples to call out from the made-up data:
The top-ranked brand on loyalty is Brand E, which moved up 3 spots to number 1, which would be a positive signal supported by improvements in trial and repeat usage to the top rank. In this made-up example, this would be a positive signal for the long-term performance of the brand beyond the current quarter. All else equal, this should lead to the recalibration of forecasts higher for multiple quarters ahead and not just the current unreported quarter.
Another made-up brand to call out is Brand D, which made big moves up the rankings on all metrics, including moving into the top 10 from outside the top 10 on multiple metrics. This would be a positive signal, leading to higher revised estimates for multiple quarters ahead.
Another made-up scenario to look at is Brand G, which is ranked mid tier on awareness but is worse in the rankings on trial, regular use, and loyalty, which would be a trouble sign, including its deteriorating rank. This signal would lead to lower estimates for future quarters. It would also be a great prompt for direct questions to the management team to understand how they would turn around this problem.
Informed by a Bayesian approach7, we can utilize these data points to adjust the probabilities of sustainable or deteriorating revenue growth, not just for the current quarter, but also looking beyond. Changes in each brand's conversion of awareness to loyalty and changes in ranking are instrumental in this adjustment process. With new data continuously becoming available, our forecasts constantly recalibrate right up until the moment results are released. In this paradigm, the approach overlaps, in the near term, with the traditional use case of predicting the upcoming quarter as an essential factor in positioning the investment. The goal of our longer-term framework is to enter trades earlier than our competitors. We achieve this by setting future quarter estimates and adjusting our exposure to the trade as more information comes to light over time. While our approach differs from the traditional use of alternative data due to its extended time horizon, it still incorporates the traditional use case within a diverse array of data points to predict the upcoming quarter's results.
External factors
In addition to understanding the funnel from awareness to the preferred product, one needs to consider the direct and indirect factors that could affect the conversion. External factors could cyclically influence the customer acquisition funnel in the near term, which would need to be monitored and considered when thinking about underlying trends. As examples, consider datasets monitoring pricing decisions, data on product or service expansions, data on changes in advertising spend, data on changing consumer and press sentiment, and data that monitors the economic health of the company’s core consumer. We can review these types of datasets in a future Data Score newsletter in detail.
Closing thoughts
As data accessibility continues to improve, it's important for investors to innovate in their data usage, transcending conventional methods and exploring untapped avenues. By applying alternative data to proven corporate world paradigms, new actionable insights can be generated that can provide more alpha than following the consensus approach of using credit card data for predictions. Combining alternative data sets into a customer acquisition funnel is a great example of this approach.
I invite your feedback. Are there other data types or strategies that have been particularly effective in your experience? Let's foster a dialogue around the innovative use of alternative data in investment strategies.
Jason DeRise, CFA | DataChorus LLC | jason.derise@datachorus.net
Alpha generation: A term used in finance to describe an investment strategy's ability to beat the market or generate excess returns. A simple way to think about alpha is that its a measure of the outperformance of a portfolio compared to a pre-defined benchmark for performance. Investopedia has a lot more detail https://www.investopedia.com/terms/a/alpha.asp
Panel: typically refers to a group of individuals, households, or businesses whose behavior is tracked over time for research purposes. In the case of credit and debit card panels, it consist of a representative sample of consumers whose transaction data is tracked and analyzed over a period of time. The data collected from these panels can then be used to understand purchasing behaviors, track spending trends, and make predictions about future consumer behavior. Similarly, receipt panels would involve tracking and analyzing data from consumer receipts. It's important to note that the quality and representativeness of these panels can significantly impact the accuracy of the insights derived from them. For instance, if a panel disproportionately represents a certain demographic or geographic area, the data derived from it might not accurately reflect broader market trends.
Buyside vs Sellside: Buyside typically refers to institutional investors (hedge funds, mutual funds, etc) who invest large amounts of capital, and Sellside typically refers to investment banking and research firms that provide execution and advisory services to institutional investors.
Syndicated market research: Syndicated market research involves collecting data and conducting research studies on specific industries or markets by a third-party organization. The research findings are then made available to multiple subscribers or clients who are interested in understanding market trends and consumer behavior.
Clickstream: Clickstream, or web traffic data, refers to the record of the web pages a user visits and the actions they take while navigating a website. Clickstream data can provide insights into user behavior, preferences, and interactions on a website or app.
Whales and minnows analysis: This analysis categorizes customers into two groups: "whales" (top spenders or users) and "minnows" (low spenders or users). It helps understand the distribution of revenue or usage among customers and assess the impact of these groups on business performance.
Bayesian approach: The Bayesian approach is a statistical method that uses prior knowledge or beliefs to update and revise probabilities based on new evidence or data. It allows for the incorporation of prior information and the updating of probabilities as new information becomes available.
Great read. Quick question on the trial/engagement section. You mention not to directly relate trial and engagement metrics to revenue and write “We know the data is predictive when corporations provide relevant KPIs and commentary that directly relate to these trial and engagement metrics”. Could you please give an example for the relevant KPI/commentary part? Thanks
Makes sense, thank you. And yes a future article on that would be great. Also interested in google trends as well.