Data Companies: At the Table or On the Menu?
High-quality data becomes more valuable as AI changes distribution. But only for companies that enable integration.
Some fifteen years ago, my colleague and I hosted a lunch meeting in Belgium with AB InBev’s CEO. It was a routine post-deal roadshow until someone asked a casual question about distributor consolidation in the US. AB InBev’s legendary CEO, Carlos Brito, revealed the company was exploring whether it could acquire distributors and vertically integrate (despite the prevailing three-tier system1), potentially moving from 7% company-owned distribution to as much as 50%2. Why would this be the strategy? Among many reasons provided Brito added, “You either have a seat at the table, or you’re on the menu.”
In 2009, craft beer had not yet exploded. But it was already having an impact because distributors wanted to sell these faster-growing, higher-margin products. At the same time, AB InBev and MillerCoors wanted to streamline their distribution to manage revenue and costs. Ultimately it was about the control of the path to value for their products.
Last week, when publicly traded software and data companies saw their share prices fall materially on concerns that GenAI3 would disrupt Software as a Service (SaaS) and data businesses, that line came back into focus. GenAI platforms are now entering the supply chain as a new distribution layer between data product companies and insights consumed by end users. Data companies face Brito’s choice: adapt to the new distribution reality or risk being bypassed.
Data-Provider Stocks Tumble on AI Competition Fears
This article focuses on data companies selling to financial market end users, though the insights apply broadly to data product companies in other verticals.
This is about the path to value for data companies. But in this case, data companies have already given away control of the last mile in the supply chain. AI is making it more apparent. But the disruption creates an opening.
What’s really happening here
Data companies think they sell directly to end users. They don’t. They sell to vertically integrated teams and systems that convert raw data into something actually usable to make decisions. They sell to analysts who prepare the insights for decision-makers. The “last mile” from raw data to actionable insight has always been outsourced to the data product’s customers.
Now GenAI, through agentic frameworks4 and Model Context Protocol (MCP)5 connections, is stepping into that last mile. Companies enabling these frameworks are becoming the new distribution layer between data companies and end users.
The question isn’t whether this shift happens. It’s whether data companies participate in it or get displaced by it.
High-quality, AI-ready data will become more valuable, but only for companies that enable integration into agentic frameworks through MCP connections, semantic layers6 , and context-ready (skills) documentation. Companies that write restrictive contracts, extract margin on AI integration, or maintain poor data governance7 will be displaced by substitutes.
In this article:
Why the last mile of data product distribution has always been the hardest part
How GenAI changes the economics of data distribution
There’s no AI strategy without a solid data strategy
Skills-based training brings data analytics to life
Data companies need to take action to be at the table
Data companies are more likely to be “on the menu” if they take these actions
Welcome to the Data Score newsletter, composed by DataChorus LLC. This newsletter is your source for insights into data-driven decision-making. Whether you’re an insight seeker, a unique data company, a software-as-a-service provider, or an investor, this newsletter is for you. I’m Jason DeRise, a seasoned expert in the field of data-driven insights. I was at the forefront of pioneering new ways to generate actionable insights from alternative data. Before that, I successfully built a sell-side equity research franchise based on proprietary data and non-consensus insights. I remain active in the intersection of data, technology, and financial insights. Through my extensive experience as a purchaser and creator of data, I have a unique perspective, which I am sharing through the newsletter.
Why the last mile of data product distribution has always been the hardest part
Data companies have always struggled to connect their data to how institutional investors actually generate value. Here’s what happens after data is delivered:
Buyers are cleansing, wrangling,8 enriching, tagging, and transforming the data into insight-ready metrics aligned with their investment process. Even web-based data products face this: every dashboard where users export to Excel signals the data product didn’t actually deliver the outcome.
Alpha9 is created in that tough work, outside the control of the data product.
Eventually alpha decays as more investors replicate the strategy or market debates shift. But once alpha ends, asset managers still need the data because it’s now explaining why the market is moving as a consensus indicator. For a data company, becoming the beta signal10 of the market is a maturity goal. Being a market beta signal means the data is have-to-have. Otherwise, the market moves because of the data points, and anyone not paying for it can’t explain the move.
Some data companies enable the workflow of analysts, traders, and portfolio managers more efficiently. If data is ingrained in the workflow, renewal likelihood increases at contract end.
Data companies rarely see this last mile of distribution because they’ve outsourced it to customers, so they wouldn’t even know if their data is alpha-generating, a market beta indicator, or part of the workflow.
Despite a lack of visibility on what’s actually happening and why their data is valuable, data companies that reach this state of workflow dependency will perceive they now (incorrectly) have pricing power.
They don’t realize it, but this is a classic network effect. If invested in properly, more demand in the network means more supply can be added as bundled services and new data products, making the network stronger.
These network effects are delicate. Some data companies seek to maximize economics in the network, perhaps because they don’t understand why their product is popular. Raising prices because usage is high creates a margin umbrella and enables new solutions to disrupt.
This margin umbrella enabled the creation of a multi-tier distribution system from raw data through to actual insights for economically sound decisions.
The typical data company sees none of the actions to convert their data into something useful because they don’t own the full supply chain to insight. So how could they know the right value of their data?
That last mile is the hardest part of generating alpha, monitoring market trends, and efficiently executing investment workflows.
The 80/20 rule that never flips
The common adage that getting to the answer takes 80% prep time and 20% analysis time has held true throughout the evolution of data distribution.
Each generation of technology, such as terminals, Excel plug-ins, and data warehouses, promised to flip this ratio. None did. Automation simply freed analysts to ask the next, harder question.
Once the answer to a critical investment question becomes known to all, it’s baked into the share price because of market efficiency. My former colleague Sam Arie wrote a recent article explaining how market efficiency actually works for practitioners:
Analysts then use their newly created free time to go deeper by combining additional data points to answer the next question that would reveal the next share price movement. Because this is moving beyond the automated process to answer the prior question, it undoubtedly takes 80% of the time to prepare before getting to the last 20% of analytics.
The 80% prep time and 20% analysis time never flip. There’s always a next question. Each deeper question becomes more nuanced and niche to how the asset manager views the market and their specific investment process.
The pain to process data and the uncertainty of how to map data to relevant answers is where the alpha is.
GenAI doesn’t eliminate this dynamic. But it does make it easier for end users to get to valuable insights from data. In the process, it disrupts the supply chain by formally shifting the last mile of insight delivery to a third party beyond both the data company and the end consumer.
How GenAI changes the economics of data distribution
GenAI fundamentally alters the distribution layer through a clear cause-and-effect chain:
AI reduces switching costs. Coding copilots make data ingestion easier. LLMs11 solve tagging and semantic layer challenges. Agentic frameworks handle deterministic calculation routing. What used to take months of custom integration now takes weeks or days.
Lower switching costs weaken inertia renewals. Past renewals were often a sign that switching to a better data product was too costly, not that the current product was optimal. As integration friction drops, inertia disappears.
Datasets embedded in AI workflows succeed. When analysts and portfolio managers interact with data through AI interfaces rather than direct feeds, datasets that aren’t integrated may become invisible, regardless of quality.
Workflow embedding strengthens network effects. The more AI workflows a dataset powers, the more essential it becomes. Each new use case increases switching costs (but this time, in the data company’s favor) and creates compounding value.
Network effects increase durability, not just margins. The game is durability through network effects achieved by becoming so deeply embedded in AI-powered workflows that substitution becomes unthinkable.
Workflow embedding directly impacts renewal economics. Each additional integration point increases switching costs in the data company’s favor.
Understanding this diagnosis of where data companies are enables clear actions that can be done to increase market share because of the opportunities created by AI.
There’s no AI strategy without a solid data strategy
Generative AI needs clean and trusted structured and unstructured data to work. Garbage-in equals garbage-out. Not all data is equally useful. Free data on the web, especially for financial market use, is not useful enough to generate actionable decisions.
I showed about a year ago that an LLM with deep research could meet the standard of the average sell-side12 analyst using just free information on the web. It showed that if given better quality data, directed by a human with discerning taste and judgment, it could generate even better results. The first pass was not good enough, using stale data and missing newer important debates. But it improved with additional prompts.
The problem is that generative AI is non-deterministic. That makes data quality non-negotiable. High-quality data means accurate, well-tagged, highly governed, efficiently distributed, and actually useful for making decisions.
Here is what will not work: Agentic workflows applied to poorly governed data where AI is expected to magically infer a missing data dictionary, a nonexistent semantic layer, and uncleansed inputs. It’s unlikely that a foundational model13 coming in 2026 can reinterpret “Bad Data” into “Good Data” on the fly.
The good news is that investments in AI-ready data will start to pay off for teams that made the initial investment and commitment to the outcome. You cannot have an AI strategy without a data strategy.
We’re likely to see those with a well-established data strategy separate from those without, because unlocking AI’s potential will be too difficult without high-quality inputs.
High-quality data companies are valuable to generative AI for the same reasons they’re valuable to data buyers. High-quality data is highly aligned with the key questions that need answers. The data is already cleansed, tagged, and enriched with relevant metadata to make understanding the data easy for users.
If data is hard for a human to understand and use, it’s going to be even harder for GenAI because it lacks good taste and judgment on its own.
High-quality data is just as valuable for human or agent consumers. Low-quality data will deteriorate the quality of outputs from human processes or GenAI processes.
Can GenAI displace data vendors with public data as is? I’d be happy to be proved wrong, but I don’t see GenAI able to fill the gaps of public data on its own. Data vendors who have done the work to make their data insight-ready are likely to benefit from the shifting AI landscape.
Skills-based training brings data analytics to life
Michael Watson (https://www.linkedin.com/in/michaeldavidwatson/) founder of Hedgineer.io, is a thought leader in this space. He’s actually using AI in production to scale his business and his clients. His insights have helped shape my thinking on this part of the puzzle.




