8 point approach to evaluating data partners
The first pitch and demo look great. But then we look under the hood…
Welcome to the Data Score newsletter, your go-to source for insights into the world of data-driven decision-making. Whether you're an insight seeker, a unique data company, a software-as-a-service provider, or an investor, this newsletter is for you. I'm Jason DeRise, a seasoned expert in the field of alternative data insights. As one of the first 10 members of UBS Evidence Lab, I was at the forefront of pioneering new ways to generate actionable insights from data. Before that, I successfully built a sell-side equity research franchise based on proprietary data and non-consensus insights. Through my extensive experience as a purchaser and creator of data, I have gained a unique perspective that allows me to collaborate with end-users to generate meaningful insights.
In the early years of Evidence Lab, it was rare for a week to go by without multiple potential data vendor presenting to us. Each of the original team members would find relevant data vendors to address pressing investment debates that we were working on with our analyst colleagues.
Prior to 2019 the Alternative Data Council at FISD1 did not exist. The Due Diligence Questionnaire (DDQ) is now commonplace (https://fisd.net/alternative-data-council/). However, UBS and Evidence Lab maintained the highest standards for risk management, which provided the guidelines for the approach we adopted. It seems we weren’t the only one who felt this way about managing the risks associated with data procurement. Eventually, it has become industry standard.
I’d like to share some of the guidelines that are important to me when assessing data partners.
Use a completed Due Diligence Questionnaire (DDQ) to understand the compliance and risk associated with a dataset.
Assess the Return on Investment (ROI) by considering how many decisions can be influenced and the potential limitations of the data.
Conduct common sense, first principles tests to ensure the data behaves as expected and reflects known events and expected seasonality. It’s surprising how often these types of tests are failed.
Perform back testing against benchmarks to measure the dataset's correlation with a known KPI, while avoiding common statistical mistakes that lead to incorrect conclusions.
Assess the transparency of the methodology used for harvesting, cleansing, and enriching the data, while respecting proprietary trade secrets.
Evaluate how the data vendor handles feedback and whether they have the capacity for custom work, understanding the potential implications on competitive advantage.
Understand the vendor's competitive set by asking about their closest competitors and their target customer base.
Examine the Service Level Agreement (SLA) for post-delivery service, including response times for errors, communication of code-breaking changes, and availability of sales engineering support.
1. Use the Due Diligence Questionnaire (DDQ)
Data companies: you should have the answers pre-written and ready to go
The DDQ is an absolute must to understand the compliance and risk of any dataset. Make sure the DDQ is filled in appropriately by the data vendor and the answers are fully understood (Data companies- you should have the answers pre-written and ready to go). As a data buyer, make sure you are comfortable with the answers around Material, non-public information (MNPI)2, Personal Identifiable information (PII)3 and their rights to sell the data for commercial purposes as well as your rights to use the data for your commercial purposes.
2. ROI assessment
The more appropriate insight use cases addressable by the data vendor, the more valuable the data becomes.
Can the data provide insights that change the decisions of the business for the better? Know how you will measure if it’s adding value in the future and estimate those metrics in advance based on the inclusion of the data.
There are a few things to think about when forecasting the ROI of the dataset. How many decisions can be influenced and what is the magnitude of getting those decisions right? What are the outcomes that are needed by the insights, how important are the outcomes in terms of magnitude and is the data good fit for addressing those outcomes?
On the buyside4 this could be the alpha generation5 by being able to correctly select long vs short6 positions.
On the sellside this could be recommendation and estimate accuracy, or the uplift in client time talking about proprietary insights.
At corporates, it could be the accuracy of forecasting appropriate inventory levels to meet demand or improving the uplift of a marketing campaign.
What are the limitations for the data / caveats that’s reduce potential scope. What use cases for the data would be dangerous to try to apply to outcomes - false positive signals are dangerous and heavily relate to assessing the appropriate caveats of the data.
A well understood common example of a caveat would be that credit card data panels, may have biases to certain demographics and regions of the US, which may not overlap well with the demographics of a retail business that the end user is trying to assess with the data. And furthermore, credit card data would not be so helpful for businesses where other means of payments are used at an inconsistent rate (e.g., cash at dollar stores). The smart buyers and users of this data type understand the limitations and work within in to generate accurate, trustworthy insights.
Another example is that web mined data sourcing needs to have the end analysis in mind while designing he collection. For example, when collecting prices of consumer products, the robot should capture the full website on a regular high frequent schedule. However, this is still just a sample of the website’s products prices. The absolute values are not useful in my opinion; however, the rate of change is high impact insight for understanding corporate pricing strategies. Furthermore, websites typically do not provide demand data to calibrate the weight of each price observation. Therefore, the end analysis needs to consider that limitation and focus on answering questions about price movements that are not dependent on demand weightings. Once again, smart buyers of this data understand the limitations and find valuable insights within the data, but don’t over extrapolate the data to insight use cases that could generate false signals.
A shorthand summary: The more appropriate insight use cases addressable by the data vendor, the more valuable the data becomes.
3. Common sense, first principles tests
It’s surprising how often these types of tests are failed.
Does the data behave the way a non-technical user would expect? For example, if a dataset measures the popularity of soft drinks in the US across all distribution channels, does the data show an expected common sense outcome that Coca-Cola as more popular than Pepsi? Rank changes are an important signal, but if it’s ranking the brands in the wrong order to begin with the data will be hard to get value from going forward.
Does the data show logical seasonality? For example, if the dataset is tracking app usage, do we see a spike of Tax apps being used in April each year in the US (as expected around the typical tax filing season in the US)?
Do known major events have an expected impact in the data? The early stages of the Covid-19 pandemic is a great example where the world dramatically changed, we should see corresponding big changes in the data. This is good to know the data is capturing real life changes.
4. Back testing vs benchmarks
This is the most common approach to testing datasets: correlate the historic data with a known KPI that’s valuable for understanding business fundamentals. I also appreciate that many data companies will provide this work, and that is helpful. But, data buyers will seek to replicate the outcome on their own. Unfortunately, there’s common mistakes I’ve seen in the process of back testing that I’d like to highlight:
Correlating absolute numbers, especially when the scale of the figures is not aligned. This typically causes correlations to be higher than when properly scaling the data. An example would be data point in the 0 to 100 range correlated with target data points in the billions. Often times, the relationship is incorrectly shown as a time series line chart with one of the metrics on a second axis and rescaled. Visually compelling, but the derivatives of the absolute number (eg rate of change and the acceleration of the change) reveal they are not correlated at all. Especially in the world of alternative data, where the data is a potentially biased and noisy proxy for the target variable, we need to move away from absolutes and to test if the signal is strong enough to generate insights relative to what the market expects, which is driven by relative metrics such as growth rates and financial ratios. Instead rescale both datasets to a metric that is logical and on the same basis as the target variable. For example, rescale the data to have similar highs and lows on a relative basis, take the log of the data, or consider converting to rates of growth and acceleration.
Not properly adjusting for seasonality. Correlation is typically higher with seasonality included but end users care about underlying trends. e.g., It’s not surprising that Ice Cream sales are higher in the summer compared to the winter. Correlating daily time series data without the adjustment for seasonality will give misleading fitness of the data.
Not removing idiosyncratic outliers that cause high correlation. For example, many datasets saw a rapid drop (or spike) in 2020 compared to 2019 and then a recovery toward 2019 levels in 2021. Many datasets show high correlation with those years included and no correlation before and after 2020-2021. As mentioned above, it is a positive that the data does pick up extreme movements around events as expected, but that doesn’t mean the data is predictive on an underlying, ongoing basis when the expected outcomes are less volatile and less uncertain.
Not testing with 3 groups of data: 1) in sample, 2) test/training sample and 3) out of sample. Unfortunately, many testers have a habit of stopping when correlation is found, but not leaving enough data sample to make sure the correlations hold up over different time periods. The in sample, test/training sample and out of sample approach helps confirm the fitness test works over multiple periods.
Please don’t do basic regressions of a single metric vs share price. I often see vendor presentations with a metric regressed vs share prices or a stock index. There are so many factors that explain share price changes on a short term and long term basis, that it just isn’t realistic to show metric as an explaining factor. Also, this isn’t how quants7 assesses alpha prediction power. If you really do believe the metric can predict share price, please consider using an added factor quant tests. The data point needs to be measured as an added factor. At a very high level, to do this measurement more appropriately, you need a base quant model that explains market movements based on proven traditional data points, including transaction costs and market impact of transactions. Then you add the new factor based on the new data source. Measure the change in alpha between the two versions to test of the metric is able to generate alpha. (Data companies: If you are able to accurately show this, there is pricing power relative to the alpha generation of the fund).
I’m sure there are many more back testing errors finance and data science professionals have come across and would love to hear from you on what I missed, especially if its more common than these errors.
5. How transparent is the methodology?
Data integrity is such an important piece of a data buyers criteria.
Data integrity is such an important piece of a data buyers criteria. With the size of data being processed, it’s not realistic to expect every single data point and every potential anomaly to be reviewed. It’s important that the process for harvesting, cleansing, and enriching the data be revealed in enough detail to let the user know they can trust the process. I do note that some elements of the methodology will be proprietary trade secrets of the data company and it would be a bad idea to reveal them for the sake of transparency. However, do as much as possible to make it easy to be trusted by your clients.
6. How does the data vendor handle feedback?
As the questions to answer change in importance over time, the data vendor may not be able to adapt with you and that’s important to know upfront.
Product/Market fit is not easy to achieve. So, it’s important to assess if the data vendor is set up to work in an agile structure to adapt to product to feedback quickly? As the questions to answer change in importance over time, the data vendor may not be able to adapt with you and that’s important to know upfront.
Do they do custom work? The above question about feedback also leads to this question. On one hand custom work could help meet your specific needs by recutting the data in a way that makes it easier for you to get value from the data. However, it’s a double edge sword as a data partner who takes on custom work for others could crowd out capacity for your requested work. Understanding prioritization of custom work is important.
And as a deeper level of understanding, it’s possible the custom work becomes standard for all clients, which may be a loss of a competitive advantage. It’s important to know the way the vendor operates around custom work and if it becomes part of the pool of capabilities others can use or if its kept just for your use. Regardless of the answers, its important to know how the data company operates so you can plan ahead about your requests and feedback.
7. How does the vendor define their competitive set?
Who does the vendor see as their closest competitors and how are they different / same? I like this question because it gives a better sense of their data market knowledge and allows for an easier assessment of what is special or valuable about the data partner vs peers.
One example of this is in the enterprise cloud database sector where a recent independent blog post comparing Snowflake and Databricks was commented on by the CEO of Databricks acknowledging where they are strong and weak. That’s a refreshing approach to becoming a trusted data partner and one that allows for long-term partnerships to be formed in the benefit of both the buyer and seller.
Rajaniesh Kaushikk's Beyond the Horizon Blog post on Snowflake vs Databricks
Ali Ghodsi's repost of the independent assessment of Databricks vs Snowflake on LinkedIn
Who else do they sell to? For me, I’m watching for unauthorized name dropping as a negative sign, especially if you signed a MNDA before getting into the details of the dataset. I’m comfortable with descriptions as “large companies in the sector covered by the data” “large sell-side research” “long/short equity hedge funds” “quant funds” as a high-level summary of who is sold to. I don’t need to hear specific names to understand the selling model of the data company. Most data buyers would not want to be sited as user of the data unless the terms of the agreement stated it was ok. Data companies should make sure they have permission to add logos of data buying companies to market materials and the website. Does it matter is others are already using the data? I think its important to know while assessing if the data is something that can be a differentiated data point or if the data is already being used elsewhere and is part of the consensus view. I believe the buyside not only needs to pay for unique datasets, but also need to know if the market is moving because of new data has been released that everyone uses and the consensus8 has moved.
8. What is the Service Level Agreement (SLA) for post delivery service?
Don’t assume one data company’s service level standards are the same for others. It needs to be discussed upfront. Some questions under this point include
“How fast will the data partner respond if an error is identified?”
“How are code breaking changes communicated and how much lead time is provided?”
“How soon are unexpected breaks responded to?”
“How are data restatements and data gaps communicated?”
“Are sales engineering services provided to help get up and running with the data?”
The newsletter is the start of the conversation:
Data Buyers - How does this approach benchmark vs your approach?
Data, SaaS and Insight Sellers - Is anything surprising about the approach?
Some fun with generative AI, because why not…
- Jason DeRise, CFA
Hey, do you love the jargon translation? Should I create and share a live document with all the jargon I’ve footnoted in each Data Score Newsletter post?
Alternative Data Council at FISD: “Founded in January 2019, the Alternative Data Council is series of working groups and information-sharing forums within FISD. It was created as part of the FISD Executive Committee’s strategic initiative to engage the alternative data community. We establish best practices for the delivery of alternative data to the investment industry and provide opportunities for education, information sharing and networking.” https://fisd.net/alternative-data-council/
“FISD is the global forum of choice for industry participants to discuss, understand and facilitate the evolution of financial information for the key players in the value chain including consumer firms, third party groups and data providers. It is a dynamic environment in which members identify the trends that will shape the industry and create education opportunities and industry initiatives to address them.” https://fisd.net/about-us/#fisd-about-mission
“SIIA is the voice for the specialized information industry. Our members provide data, content and information that drives the global economy, informs financial networks and connects learners and educators. SIIA unites, defends and promotes our diverse membership. Learn more about our educational and networking opportunities, events and benefits helping you grow your business, your career and the industry at large.” https://www.siia.net/about-us/
Material, Non-Public Information (MNPI): Information about a company that is not publicly available and could have a significant impact on the company's stock price if it were made public.
Personal Identifiable Information (PII): Any information that can be used to identify an individual, such as name, social security number, address, or phone number.
Buyside vs Sellside: Buyside typically refers to institutional investors (Hedge funds, mutual funds, etc) who invest large amounts of capital and Sellside typically refers to investment banking and research who provide execution and advisory services to institutional investors.
Alpha generation: A term used in finance to describe an investment strategy's ability to beat the market or generate excess returns. A simple way to think about alpha is that its a measure of the outperformance of a portfolio compared to a pre-defined benchmark for performance. Investopedia has a lot more detail https://www.investopedia.com/terms/a/alpha.asp
Long/Short: Long/Short Equity funds buy positions (long) in stocks they believe will go up and value an sell-short stocks (short) that they believe will go down in value. Typically, there is a risk management overlay that pairs the long and short positions to be “market neutral” meaning it doesn’t matter if the market goes up or down, what matters is the long position out perform the short position. Short selling, as a simplistic definition, is when an investor borrows stock from an investor who owns it, and then sells the stock. The short seller will eventually need to buy back the stock at a later date to return to the owner of the stock (and will profit if they buy back the stock at a lower price than they sell it.
Quant funds: Short for "quantitative funds," also referred to as Systematic Funds: Systematic refers to a quantitative (quant) approach to portfolio allocation based on advanced statistical models, and machine learning (with varying degrees of human involvement “in the loop” or “on the loop” managing the programmatic decision making).
“The Consensus” is the average view of the sell-side for a specific financial measure. Typically it refers to Revenue or Earnings Per Share (EPS) but can be any financial measure. It is used as benchmark for what is currently factored into the share price and for assessing if new results/news are better or worse than expected. However, it is important to know that sometimes there’s an unstated buyside consensus that is the better benchmark for expectations.