Top Questions ahead of Neudata’s New York Winter Data Summit
Neudata hosts its Winter Data Summit on December 7th in NYC’s financial district. Here are some questions for the speakers that are top of mind.
Welcome to the Data Score newsletter, composed by DataChorus LLC. The newsletter is your go-to source for insights into the world of data-driven decision-making. Whether you're an insight seeker, a unique data company, a software-as-a-service provider, or an investor, this newsletter is for you. I'm Jason DeRise, a seasoned expert in the field of data-driven insights. As one of the first 10 members of UBS Evidence Lab, I was at the forefront of pioneering new ways to generate actionable insights from alternative data. Before that, I successfully built a sell-side equity research franchise based on proprietary data and non-consensus insights. After moving on from UBS Evidence Lab, I’ve remained active in the intersection of data, technology, and financial insights. Through my extensive experience as a purchaser and creator of data, I have gained a unique perspective, which I am sharing through the newsletter.
On December 7th, 2023, Neudata is hosting its New York Winter Data Summit. https://www.neudata.co/agenda/new-york-winter-data-summit-2023
Ahead of the event, I’m sharing key questions on my mind for each speaker and panel.
There are three broad themes that my questions address
Understanding the evolving role of alternative data1 in predictions in quantitative2 and fundamental discretionary3 investing.
Discovering cutting-edge applications of AI and generative AI in financial analysis.
Uncovering real-world insights on the economy and key sectors from recent data trends.
Check out the questions below.
Let me know what additional questions you have for the panelists.
And I’ll be there as a judge for the “Shark Tank” session if you want to stop by and say hi.
8:50am Empire Stage: Fireside Chat
Stephen Cash, Founding Partner, Seven Eight Capital
Ian Webster, Managing Director, Neudata
Questions: Have alternative data companies improved in their ability to meet the needs of quantitative funds, or does it remain very difficult to integrate alternative data into a systematic4 process? As more quantitative funds are adding alternative data to their processes, how has the ability to generate alpha5 changed?
9:10am Empire Stage: Panel: US Elections: The Alternative Data Perspective
Moderator: Michael Hejtmanek, Data Intelligence Solutions, Neudata
Charles Myers, Chairman & Founder, Signum Global Advisors
Drew McCoy, President, Decision Desk
Question: What key data points should analysts monitor before major elections to accurately predict voter behavior on election day?
9:10am Chrysler Stage: Beyond models - applying Data Science/AI effectively
Alfred Spector, Visiting Scholar, MIT
Question: What are some best practices in applying MLOps6 to support continued accuracy as data’s relation to real-world events evolves?
9:40am Empire Stage: Language Models in Macro: Tracking Inflation Turning Points In The Local News
Laurent Bilke, Founder and CEO, Alternative Macro Signals
Question: How can the language model be trained to understand the difference between news reflecting what has already been incorporated into the market’s thinking and information that is new relative to consensus views?
9:40am Chrysler Stage; A Comparison of NLP Methods on Corporate Earnings Calls
Dan Joldzic, CEO, Alexandria Technology
Question: Has the ability to extract accurate alpha signals from earnings calls changed as applying NLP7 to earnings calls has become more mainstream, and at the same time, the C-Suite8 is being coached on wording choices based on the same technology? By the way, as prep for the session, Alexandria Technology compared its model with FinBert9 and Loughran McDonald10 in a white paper found here: https://www.alexandriatechnology.com/insights/comparing-natural-language-processing-nlp-approaches-for-earnings-calls
10:20am Empire Stage: Panel: Nowcasting11 the Macroeconomy
Moderator: Konstantinos Vafeidis, Senior Analyst, Neudata
Thanh-Long Huynh, CEO, QuantCube
Brian Peller, Founder, LiJo Advisory
Julia Asri Meigh, Head of ESG12 & Macro Data Research, Neudata
Question: As more data becomes available about consumers, businesses, and the overall economy and the computing power to process that information into a forecast has increased, are we approaching a point where alternative data models could be more accurate about what’s actually happening in the economy compared to the official government statistics that rely on older, less sophisticated methods?
10:50am Empire Stage: Freight Transportation: The Largest Commodity in the World
Daniel Pickett, Chief Data & Technology Officer, FreightWaves
Questions: How are discretionary retailers reacting to inventory levels following Black Friday and Cyber Monday this year? Have they mostly gotten it right? Any signs of retailers having either oversupply or being out of stock?
11:10am Empire Stage: Adapting Spaces: 2024 The Real Estate Crunch
Ed Lavery, VP of Investor Intelligence, Placer.ai
Questions: Does the data show a change in the balance of remote vs back-to-office work? Which regions are seeing improved visits to offices, and which regions are staying at low levels of office visits?
11:30am Empire Stage: US Economy and Rates Discussion with Morgan Stanley Research
Matthew Hornbach, Global Head of Macro Strategy, Morgan Stanley
Questions: Is the government updating its approach to measuring economic growth and inflation, using more advanced datasets and analytic techniques than it has in the past? If so, how is it changing, and can investors replicate the approach?
12:20pm Ivy Stage: LUNCH AND LEARN: Beyond Ozempic: GLP-113 trends and other applications of healthcare data
Julia Fitzgerald, Product Management Director, Earnest Analytics
Konstantinos Vafeidis, Senior Analyst, Neudata
Question: Can you share a view of how different alternative dataset types can be used throughout the lifecycle of a new drug being researched and brought to market?
1:00pm Empire Stage: Leveraging Employment Data to Find Alpha
Scott Hamilton, CEO, Live Data Technologies
Question: Which key performance indicators (KPIs) in your data are most indicative of potential issues in high-growth companies?
1:00pm Chrysler Stage: Global commodity flow data
KPLER
Questions: What does the data say about the reaction to OPEC+ supply cuts? Are all members holding firm on the policy? Are any other nations producing more in response?
1:20pm Empire Stage: Beyond Transaction Data: How do you harness Clickstream for intent?
Moderator: Barney Bruce-Smythe, Associate, Neudata
Eli Goodman, CEO and Co-Founder, Datos
Evan Reich, Head of Data Sourcing and Strategy, Verition
Questions: Can you share some examples where the transaction data and clickstream14 data showed different signals, and the clickstream data turned out to be more accurate? Is there a commonality to the examples that can inform a strategy around weighting the different data sources?
1:20pm Chrysler Stage: Working With The Buyside15 - A Guide to Being a Successful Data Vendor
Moderator: Eliza Raphael, Head of Market Data Services, ex-Schonfeld
Jason Koulouras, Research, Analytics, Intelligence & Data Ranger, Global Market Data Leader, Bridgewater Associate
Stewart Stimson, Head of Data Strategy, Jump Trading
Question: If the panel could magically get all data vendors in attendance to change for the better, what would be the most important thing they should start doing and the most important thing they should stop doing?
1:50pm Empire Stage: Panel: Using alternative data for generating an index16 - opportunities and pitfalls
Moderator: Efram Slen, Vice President, Global Head of Index Research, Nasdaq
Andreas Zagos, Advisor, IPR Strategies Ltd.
Mark Marex, Senior Director, Index Research & Development, Nasdaq
Ryan T. McCormack, Sr. Factor & Core Equity Strategist ETFs17 & Indexed Strategies, Invesco
Question: How does the panel recommend incorporating alternative data into index creation, given the rapidly evolving market and the continuously changing data-driven questions?
1:50pm Chrysler Stage: Panel: Dataset Trials - For Free or Not For Free, That Is The Question
Moderator: Daniel Entrup, Author, It's Pronounced Data
Mark Fleming-Williams, Head of Data Sourcing, Capital Fund Management
Tim Stegner, Chief Revenue Officer, Deception And Truth Analysis (D.A.T.A.)
Brian Peltonen, Director of Data Analytics, Fidelity Investments
Question: What are the most common misunderstandings that data companies have about the data vetting process on the buyside?
2:20pm Empire Stage: A Glimpse Ahead: Using Today’s AI To Map Out Mobile App Trends Of Tomorrow
Matt Birzon, Director of Sales at data.ai
Questions: Is it feasible for AI agents to replace the jobs done by the multitude of apps commonly used by consumers? Does the data show early signs of app usage falling in favor of Gen AI apps?
2:20pm Chrysler Stage: The Rise of the Data Provider - An Assessment of the Supply Side
Daryl Smith, Head Of Research, Neudata
Michael Hejtmanek, Data Intelligence Solutions, Neudata
Question: Where would you place alternative data on the Gartner Hype Cycle paradigm?
2:20pm Ivy Stage: Workshop: The latest on alt-data for value creation and diligence
Barney Bruce-Smythe, Associate, Neudata
Danesh Kissoon, Research Analyst, Neudata
Question: What are key coverage gaps in the alternative data market where new credible data providers could emerge to address unmet demand?
3:00pm Empire Stage: Panel: Unleashing the potential of AI in the alternative data space
Moderator: Hope D. Skibitsky, Associate, Quinn Emanuel
Adam Nahari, Partner, Pinegrove Capital Partners
Raymond Jones, Vice President, Similarweb
Aris Tentes, Executive Director, QIS Research, Morgan Stanley
Question: Which aspect of alternative data has the most upside potential from applying AI to the process: a) Sourcing/creating data, b) cleansing data, c) enriching data, d) interpreting what the data says, or e) making investment decisions?
300pm Chrysler Stage: Shark Tank
Data Companies:
Stewart Huckett, Sales & Marketing Director, Yellow Submarine
Amarachi Miller, VP of Product, Caden
Ronen Feldman, CEO and Co-Founder, ProntoNLP
Mike Audi, Founder and CEO, TIKI
Judges:
Beth Pollack, Operating Partner, Applied AI and Data Strategy, Decision Science Advisors
Jason DeRise, Head of Data and Analytics Products, Liberty Mutual Investments
Adam Brown, Vice President, Systematic Advisory Sales, Morgan Stanley
Question: Four very exciting new data companies with unique approaches are participating in the contest. I’m excited to hear the presentations and look forward to being a judge. So my question is: Will the audience agree with the judges?
3:30pm Empire Stage: Panel: Incorporating LLMs18 Into the Deal Process
Moderator: Saif Zia, Business Development Manager, Neudata
Claire Saint-Donat, Director of Data Science, Bain Capital
Neil Callahan, Founder & Managing Partner, Pilot Growth Equity
Chandler Klose, AI Strategy Advisor, Independent
Question: Private markets have less structured data to make decisions from; are large language models already being implemented to extract insights from unstructured documents to generate improved insights?
If you think this is useful for someone attending the conference, please feel free to forward it on.
What questions would you ask? Leave a comment below.
Like this content and want to get the newsletter straight to your email?
- Jason DeRise, CFA
There is so much jargon in this entry, including several new terms for the Jargonator:
Alternative data: Alternative data refers to data that is not traditional or conventional in the context of the finance and investing industries. Traditional data often includes factors like share prices, a company's earnings, valuation ratios, and other widely available financial data. Alternative data can include anything from transaction data, social media data, web traffic data, web-mined data, satellite images, and more. This data is typically unstructured and requires more advanced data engineering and science skills to generate insights.
Quant funds: Short for "quantitative funds," also referred to as systematic funds. Systematic refers to a quantitative (quant) approach to portfolio allocation based on advanced statistical models and machine learning (with varying degrees of human involvement “in the loop” or “on the loop” managing the programmatic decision-making).
Fundamental Discretionary Investors: refers to institutional investors that leverage portfolio manager’s judgment and decision-making to allocate capital (leveraging varying degrees of statistical, data-driven analysis).
Systematic Fund: Systematic refers to a quantitative (quant) approach to portfolio allocation based on advanced statistical models and machine learning (with varying degrees of human involvement “in the loop” or “on the loop” managing the programmatic decision-making).
Alpha: A term used in finance to describe an investment strategy's ability to beat the market or generate excess returns. A simple way to think about alpha is that it’s a measure of the outperformance of a portfolio compared to a pre-defined benchmark for performance. Investopedia has a lot more detail https://www.investopedia.com/terms/a/alpha.asp
MLOps or Machine Learning Operations: a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning (or deep learning) lifecycle. It aims to shorten the development cycle of machine learning systems, provide high-quality and reliable delivery, and innovate based on continuous feedback and monitoring.
NLP (Natural Language Processing): An AI technology that allows computers to understand, interpret, and respond to human language in a quantitative way, generating statistical measures of sentiment and the importance of topics.
C-suite: Top executives at a company—CEO, CFO, COO, etc.
FinBert: A specialized version of language processing AI models. FinBert is adapted for financial contexts. FinBert is a specialized variant of the BERT (Bidirectional Encoder Representations from Transformers) model, which was a breakthrough in the field of natural language processing (NLP). Developed by Google, BERT models are designed to understand the context of a word in a sentence more effectively than previous NLP models. FinBert leverages the advanced capabilities of BERT models while being fine-tuned to address the specific language and analytical needs of the finance sector.
Loughran-McDonald Lexicon: TThe Loughran-McDonald Lexicon is a specialized financial dictionary developed for the analysis of financial documents using natural language processing (NLP). The lexicon addresses a key challenge in the field of financial text analysis: the fact that common words often have different meanings in financial contexts compared to general usage. Unlike general-purpose sentiment dictionaries, which might misinterpret the sentiment of financial texts (for example, treating "liability" as a negative term in a general context, whereas in finance, it's a neutral term referring to debts or obligations).
Nowcasting: In order to systematically forecast the next reported economic or company-specific financial result, multiple sources of high-frequency data are combined. The model continuously updates the forecast with increasing accuracy as the volume of data covering the unknown period increases.
ESG: Environmental, Social, and Governance (ESG) refers to the three central factors in measuring the sustainability and societal impact of an investment in a company or business.
GLP-1: GLP-1 agonists are a class of medications that mainly help manage blood sugar (glucose) levels in people with Type 2 diabetes. Some GLP-1 agonists can also help treat obesity. https://my.clevelandclinic.org/health/treatments/13901-glp-1-agonists
Clickstream data: Clickstream, or web traffic data, refers to the record of the web pages a user visits and the actions they take while navigating a website. Clickstream data can provide insights into user behavior, preferences, and interactions on a website or app.
Buyside: typically refers to institutional investors (Hedge funds, mutual funds, etc.) who invest large amounts of capital, and Sellside typically refers to investment banking and research firms that provide execution and advisory services (research reports, investment recommendations, and financial analyses) to institutional investors.
Index Development: This involves the creation and improvement of financial indices, which are statistical measures of the performance of a group of stocks or assets.
ETF (Exchange-Traded Fund): An ETF is an investment fund traded on stock exchanges, much like stocks. An ETF holds assets such as stocks, commodities, or bonds and generally operates with an arbitrage mechanism designed to keep it trading close to its net asset value, although deviations can occasionally occur. ETFs offer a cost-effective, liquid, and flexible way for investors to purchase a diversified portfolio that tracks a particular index, sector, commodity, or other asset classes. Unlike mutual funds, which are priced at the end of each trading day, ETFs are bought and sold throughout the day at market price, offering more flexibility for investors.
Large Language Models (LLMs): These are machine learning models trained on a large volume of text data. LLMs, such as GPT-4 or ChatGPT, are designed to understand context, generate human-like text, and respond to prompts based on the input they're given. It is designed to simulate human-like conversation and can be used in a range of applications, from drafting emails to writing Python code and more. It analyzes the input it receives and then generates an appropriate response, all based on the vast amount of text data it was trained on.
Thanks Jason for putting all this together, always appreciate when you do this for the events as it helps me to get my head into the game from different aspects that I don't always have front of mind. For the question posed to the session I am part of:
A) Two things for vendors to start doing:
Thing one: Be crisp and succint in your product pitch, especially with the service/offering attributes (history, source of data, data management applied if at all, potential biases/incompletes etc. The more up front crisp I get, the faster and easier it is for myself and my colleagues to connect the dots, assess the probability and make a decision whether to use the vendor's time (and of course our time). In this include your vision for 3+ years out (and don't worry about whether you get it beautiful or high fidelity - put risk on through the communication.
Thing two: Communicate with me your pitches and ideas - I perceive I am easy to find on LinkedIn and I take cold calls and cold e-mails as well. Especially for the smaller vendors who are founder/leader staffed first and foremost - I will engage and respond. P.S. Don't put details in social media platforms, not a good security approach in my opinion.
B) Two things for vendors to stop doing:
Thing One: Related to the above first thing to do, do not inaccurately represent the product and services (whether under or over), work to be as accurate as possible. Keep it simple, have a at most one pager for the initial - bios, company provenance, focus, distinctive competence, practical vision statement, regulatory risks you are aware of and how dealing with them. That's all I need for an initial assessment, probability of hit, probability of the whom would care, and whether to keep investing our joint time.
Thing Two: If you are a small or medium vendor, I have (as likely many of my contemporaries do) a well curated "mid-point" contract that we can use together for the commercial agreement. You do not need to go get a lawyer to construct one and invest your $ into the engagement (except for review of ours). It's fair and balanced and has evolved over time (we are now on version 14 measured over 13 years), I manage it personally and ensure it is a "good deal", and have tested with external counsel and many vendors. Let's make it easy to do business together.