The Questions That Defined Data-Driven Investing in 2025
What 248 conference questions reveal about how the industry moved from experimentation to execution.
Over the course of 2025, I asked 248 questions across 651 speakers within 16 data conference previews, each designed to understand how investors and data practitioners were framing their work in real time.
When those questions are viewed individually, they reflect the tactical concerns of specific panels, speakers, and data products. When viewed together, they form a dataset that captures how the industry evolved over the course of 2025.
Using ChatGPT to tag and analyze the full set, a clear shift emerges toward execution. Governance, workflow integration, data quality, and measurable impact rose in importance as data and AI were increasingly judged by how well they performed inside real investment processes.
The questions themselves trace where the industry is today and hint at where it is headed next.
My friends at BattleFin are looking for new data providers to participate in their upcoming “New Data Provider Showcase” at BattleFin Miami. Apply by December 29th.
💡If you provide data or analytics that help investors see financial performance before it shows up in earnings, we’d like to hear from you.
If you, the firm, are new to BattleFin i.e. never participated at any one of our Discovery Day Conferences.
Please apply by Dec 29th, 2025 with a short description of your dataset(s), coverage, delivery methods, and any additional information that describes potential insights from the data.
Dan Entrup of AggKnowledge and “It’s Pronounced Data,” Michael Watson of Hedgineer, and I will help BattleFin decide which three providers will come to BattleFin Miami to pitch their product, with the winner receiving a complimentary data table at the next BattleFin conference.
Use this link to apply. https://web.battlefin.com/miami-2026-new-data-provider-showcase-data-chorus
What I did to break down 2025 into its key themes
The Data Score’s 2025 conference preview series reflects how the data and investing industry evolved over the year. The pace of change in 2025, driven by rapidly evolving Gen AI capabilities and shifting market and macro conditions, forced data-driven investors to pivot data sources, technology choices, and analytic strategies.
Over the course of 2025, I wrote 37 articles (including this one), of which 16 were data conference previews. In these conference previews, I offer top questions for each of the speakers, panels, and topic agendas to help attendees prep and make the most of their time at the event. In total, I asked 248 questions across 651 speakers.
My top questions across these conferences also reveal a clear trend in what’s happening in the industry. Understanding what’s happening now is a prerequisite for thinking clearly about 2026 and the next evolution of the data industry.
This article breaks down 2025’s trend as seen through The Data Score conference “Top Questions” series.
To generate this analysis, I used ChatGPT 5.2 Thinking. I had the agent scrape my articles and gather the questions into a single table. I then had it tag each question by topic, first generating a comprehensive list of possible themes and then applying those tags across the full set.
From there, it generated a time-series view showing how topic frequency and emphasis changed over the year.
Welcome to the Data Score newsletter, composed by DataChorus LLC. This newsletter is your source for insights into data-driven decision-making. Whether you’re an insight seeker, a unique data company, a software-as-a-service provider, or an investor, this newsletter is for you. I’m Jason DeRise, a seasoned expert in the field of data-driven insights. I was at the forefront of pioneering new ways to generate actionable insights from alternative data. Before that, I successfully built a sell-side equity research franchise based on proprietary data and non-consensus insights. I remain active in the intersection of data, technology, and financial insights. Through my extensive experience as a purchaser and creator of data, I have a unique perspective, which I am sharing through the newsletter.
Top Question Themes from 2025:
ChatGPT identified 32 themes across the questions. The chart below shows how often each theme appeared. The top 5 are
Alpha Generation and the Investment Process
Quant Modeling and Forecasting
Research Workflow and Analyst Adoption
Web-Mined Alt Data
ROI of Data and Technology
It shouldn’t be a surprise that Gen AI isn’t listed in the top 5 questions, even though it was a constant theme throughout the conference. Investors and data practitioners care deeply about outcomes. Methodologies matter only insofar as they are appropriate for the problem being solved. Gen AI is a tool like any other tool available to us. Understanding the outcomes, the pain points, the context, and the underlying why behind the data and tech needs is the key to success. My questions and writing are always focused on this.
The themes evolved over the course of 2025
Here is ChatGPT 5.2 Thinking’s summary of the results. The conclusions aligned closely with my own observations and required no additional prompting. Here’s its write-up:
🤖 Summary: what this says about 2025 at the intersection of data, tech, AI, and investing
The discourse matured from “alpha potential” toward operational reality (governance, quality, procurement, adoption).
Traditional market data re-entered as a first-class citizen in the AI/data stack conversation (licensing, integration, and workflows).
GenAI broadened from “LLMs” to LLMs embedded in research workflows (agentic/automation + evaluation/guardrails), while quant modeling stayed core and even strengthened.
Alternative data emphasis narrowed toward modalities with clear bridges to macro/investment decisions (e.g., web prices), while some modalities became less central in late-year question coverage.
The themes can be mapped into higher-level groupings, which ChatGPT 5.2 Thinking did for me, accurately. The groupings are Investing/Models, Data Modalities, Workflow/Org, Infrastructure, AI, Risk/Control, and Other.
Here’s the comparison of Q4 vs Q1 by granular investment topic for the top 15 topics.
🤖
1) The year shifted from “alpha framing” toward “implementation + production.”
Alpha/Investment Process was still a top theme, but it fell sharply in H2:
~47.9 → 30.7 mentions per 100 questions (17.2).
Meanwhile, topics that signal “productionisation” rose meaningfully:
Data Governance/Compliance: ~14.0 → 21.3 per 100 (+7.2)
Data Quality/Validation: ~4.1 → 11.0 per 100 (+6.9)
Interpretation: the questions increasingly emphasized what it takes to operate data/AI in real investment workflows (controls, quality, governance), not just the promise of new signals.
2) Traditional market data reasserted itself as a core part of the “AI + data” conversation
Market Data/Traditional Data jumped the most of any topic:
~10.7 → 24.4 per 100 questions (+13.7)
Interpretation: By mid/late year, the center of gravity moved toward integrating AI and alternative data with the traditional market-data backbone (coverage, timeliness, licensing, and workflow fit).
3) AI evolved from “LLMs” toward “LLMs in workflows” (agentic/automation)
Two different AI narratives strengthen over the year:
GenAI/LLMs: ~8.3 → 18.1 per 100 (+9.8*)*
Agentic AI/Workflow Automation: ~5.8 → 10.2 per 100 (+4.5*)*
Interpretation: less “What can LLMs do?” and more “How do we embed them into repeatable research processes with tooling/orchestration?”
4) “Quant modeling/forecasting” didn’t fade—it intensified
Quant modeling/forecasting increased:
~33.9 → 41.7 per 100 (+7.8)
Interpretation: even as GenAI rose, your questions increasingly pressed on measurable signal extraction, forecasting discipline, and what actually survives contact with backtests and production constraints.
5) Web prices stayed a durable “bridge” between data, macro, and investing
Alt Data: Web/Scraped/Online Prices rose and remained high:
~32.2 → 37.0 per 100 (+4.8)
Interpretation: web prices are one of the most “investable” alt-data modalities because they map directly to inflation, demand, and company pricing power—and your questions kept returning to that link.
Drilling down into the alt data types (or modalities, as ChatGPT has decided to label them). There is a larger theme of datasets that support macroeconomic indicators growing in focus, which makes sense given the critical investment questions around inflation, tariffs, and the US government shutdown. Web-mined datasets include price monitoring, but I would also note that web-mined job listings were also top of mind in 2025. Supply chain and logistics data also were top of mind due to the ever-changing tariff policy.
What was potentially missing from the discussion at data conferences in 2025?
I noted three major misses in my questions for the conference participants after looking at the tagged themes. Some of this reflects gaps in my own questioning. But it’s also a reflection of the speakers and topics at the conference:
Nothing about using data and technology to analyze AI demand, data center supply and demand, or assessing if there is an AI bubble. The data industry is so focused on the use of AI but isn’t putting conference topics forward about monitoring the AI industry itself.
There was very little about monitoring energy usage, especially in the context of AI and its sustainability, despite the very close link between AI demand and the potential constraints on supply.
Monitoring cybersecurity as a risk to companies due to the growing arms race between hackers and cybersecurity in the age of AI. This rose in importance as the year progressed but still remained relatively low in the rankings.
Looking ahead to 2026
I challenged ChatGPT 5.2 Thinking to project the top themes for 2026. I put on extended thinking, but I don’t think it got the memo. Before I could type this, it came up with its themes. Before I look at its top 5 projected themes, here’s what I suspect will be top of mind in 2026:
Alpha Generation & Investment Process: specifically, the challenge continues to be converting data and technology into a meaningful competitive edge. In a world where everyone has access to data and technology, what you do with data and technology to generate insights matters most.
Data Governance & Compliance: Specifically related to data quality and data rights fed into generative AI frameworks. I would argue the success of future AI agents is still more about the quality of the inputs than the ability of the AI to reason harder.
AI Sustainability: Cybersecurity, data rights, and energy usage: Each has dual implications for the audience at the conferences and the markets overall.
Agentic Framework and Workflow Automation in Production: We are going to hear more about real use cases that work and how data companies need to adapt their go-to-market strategy to adapt.
Alternative data connecting AI demand and supply with key investment debates: These niche types of data will become more mainstream as awareness grows and data history builds.
Let’s see what ChatGPT 5.2 Thinking created (well, at most 2 seconds of thinking)
🤖
AI ROI in production (not pilots)
Which workflows are truly automated end-to-end?
What’s the measurable lift vs human baselines?
How do teams prevent “tool sprawl” and sustain adoption?
AI governance, model risk, and auditability
Traceability of answers (sources, lineage, prompt/model/version)
Controls for hallucinations, bias, leakage, and compliance
“What evidence is sufficient for model approval?”
AI infrastructure as an investable theme
Datacenter buildout, GPU supply chains, capex cycles
Power/grid constraints, siting, permitting, cooling, water
“How do you monitor demand without relying on company guidance?”
Data rights, privacy, and defensibility
Licensing, provenance, “what you’re allowed to do” with data/models
Scraped data durability, platform policy risk
Moats: exclusive data, first-party access, distribution advantage
Next-gen alternative data packages: multi-modal + entity-level truth
Combining web + transactions + supply chain + location + text/image/audio
Entity resolution, point-in-time correctness, and coverage bias
“How do you prove it generalizes across regimes?” </aside>
A solid list. The overlap is telling.
What are your trends seen in 2025? What do you think is coming for the industry in 2026?








Traditional market data especially deep and accurate history has always been there as a critical underlying component and yet has not received much "love" in recent years and this year it became a topic of interest especially for the folks who have novice level understanding. I have had to dust off some of my work and thinking from the 1995-2008 period, and re-apply it to the spike in interest. thanks for doing this work Jason, it's not work you can find elsewhere.