Top Questions ahead of Battlefin Discovery Day New York, 2024
Battlefin’s Discovery Day in New York is on May 21st and 22nd. Here are the questions on my mind for the speakers.
Welcome to the Data Score newsletter, composed by DataChorus LLC. The newsletter is your go-to source for insights into the world of data-driven decision-making. Whether you're an insight seeker, a unique data company, a software-as-a-service provider, or an investor, this newsletter is for you. I'm Jason DeRise, a seasoned expert in the field of data-driven insights. As one of the first 10 members of UBS Evidence Lab, I was at the forefront of pioneering new ways to generate actionable insights from alternative data. Before that, I successfully built a sell-side equity research franchise based on proprietary data and non-consensus insights. After moving on from UBS Evidence Lab, I’ve remained active in the intersection of data, technology, and financial insights. Through my extensive experience as a purchaser and creator of data, I have gained a unique perspective, which I am sharing through the newsletter.
On May 21st and 22nd, the data community will board the Intrepid Battleship Museum in New York City for the annual Battlefin Discovery Day New York Conference.
Link to the agenda and registration: https://www.battlefin.com/events/new-york-2024
3 themes across my questions for the panelists
Predictive Power of Alternative Data1
Integration and Utilization of Advanced Technologies
Sector-Specific Insights and Trends
DAY 1 AGENDA
MAY 21ST, 2024
Alt Data Content/ Panel Sessions, Lutnick Theatre Intrepid Air and Space Museum
9:00AM Data Arbitrage with Proprietary Dividend Forecasts
Historically Precise Updates Led to U.S. Outperformance
Daniel Sandberg Managing Director, New Product Development & Research, S&P Global Market Intelligence
Question: What is the proprietary nature of the dividend forecast source and methodology that generates the outperformance?
9:30AM Macro Trends: Leveraging foot traffic to identify 2Q and 2nd half real estate and consumer trends
Consumer / Retail, what sectors are people spending money on? (discount stores, beauty, luxury, travel? Join us to hear what trends are developing) Commercial real estate trends
Todd Schmucker-MODERATOR Director of Data Strategy, Walleye Capital
Question: For commercial real estate sectors where foot traffic trends have been deteriorating for some time, are there any signs of bottoming or turning back upward?
10:00AM Nowcast2 Presentation: Top 3 Japanese Themes
Will the Nikkei continue to rocket to new high's? What's the data showing?
How is the Japanese consumer adapting to higher interest rates?
Highlights of the top 3 companies seeing the most momentum?
Dirk Renick- Moderator, Quantitative Research & Development Lead, Abu Dhabi Investment Authority (ADIA)
Alex Occhipinti, Business Development, Nowcast
Question: In light of recent economic policies and market dynamics, which emerging sectors in Japan are showing the most potential for sustained growth, and what data supports these projections?
10:15AM Tracking & Predicting the 2024 Presidential Elections with Alternative Data
Tracking the swing states
Can survey data predict the winner first?
What role will social media have on the election?
Crystal Berger - Moderator, Media Tech Entrepreneur & Founder, As seen on Fox, EBO
Brian Scanlon, General Manager, Elections Services, The Associated Press
Question: How can data from betting markets fit into the mosaic of predicting the election outcome?
10:30AM Geopolitical, Economic, and Security Trends, Threats, and Opportunities Around the World
Crystal Berger - Moderator Media Tech Entrepreneur & Founder, As seen on Fox, EBO
Bill Frischling Distinguished Scientist and VP of Emerging Technologies, FiscalNote
Question: What strategies can investors employ to monitor and assess the economic and corporate impacts of cyber warfare in real-time?
10:45AM New Product and Platform Showcase
Hear about 3 new Product introductions from 3 Alternative Data Providers
Thematic Factor Risk model Data set
Streamlined Web Data Gathering Platform
Rich Brown, Global Head, Market Data, Jain Global
Question: What are the most innovative aspects of the new data products being introduced, and how do they address current gaps in the market?
11:15AM Risk management Mosaic
Leveraging Alt Data to give early insights and more metrics to see the whole picture
Finding robust early warning systems to manage drawdowns and identify investment opportunities
Analyzing the 2025 credit market for early warning signs
Evan Reich- MODERATOR, Global Head of Data Strategy and Sourcing, Verition Fund Management
Stas Melnikov, Head of Quantitative Research and Risk Data Solutions, SAS
Question: Which types of alternative data should be closely followed for assessing potential downside risks in valuations and credit spreads over the next few months?
11:30PM Oil & Gas Mosaic: What is the data showing for 2nd half catalysts?
Tim Harrington - MODERATOR, CEO & Co-Founder, BattleFin
Karl Critz, CTO, Salient Predictions
Reid I'Anson, Economist, Kplr
Question: In the data-rich energy sector, which specific datasets are currently underutilized but could provide significant insights if leveraged more effectively?
DATA SCIENCE TRACK CONTENT
Lutnick Theater, Intrepid
May 21st, 2024
Link to agenda for this track: https://www.battlefin.com/events/nyc-data-science-2024
2:00PM - 3:00PM The Integration of Structured and Unstructured Text/Data, and The Use of LLM's to Find Signal
Short term signals: Using news and social media during nontrading hours to explain subsequent price behaviour of stocks at the market opening. Building on the 2022 academic research paper from Gan, Alexeev and Yeung, the POC dives into the approaches traders can take to inform their decisions on selecting algos for specific trades on their blotter.
Long term signals: Using the sentiment from news and social media to develop a multi-factor model for alpha3 strategies with monthly investment horizons. Based on the research jointly done by the StarMine and MarketPsych teams, the model considers scores spanning the company’s equity, business and management.
Question: How can one extract forward-looking signals from news data while minimizing the noise from reports that merely reiterate known information?
Unlocking the Interconnected Datalake4 for signal generation:
Tim Anderson, Director - Quantitative & Economic Data - Head of Business Solutions at LSEG, will discuss the workflows on integrating Realtime, Tick History5, fundamental data and Large Languages Models6
Tim Anderson Director - Quantitative & Economic Data - Head of Business Solutions, LSEG
Question: What combination of databases and technologies do you recommend for creating efficient prescriptive AI models that leverage LLM-driven data?'
Converting Unstructured to Structured Data to Derive Value from Text:
Nicole Allen, Product Director, Textual Analytics at LSEG, will discuss recent case studies leveraging LSEG’s Reuters newsfeed, news analytics and the StarMine M&A target model.
Nicole Allen Product Director, Textual Analytics LSEG
Questions: How extensive must the training data and fine-tuning process be to develop a model capable of understanding financial market contexts and sector-specific jargon? Would it be more effective to use multiple specialized models?
Proof of Concept demos using sentiment derived from news and social media:
Dr. Richard Peterson, CEO, and Anthony Luciani, Quantitative Researcher, at MarketPsych will review two investment use cases for using sentiment:
Dr. Richard Peterson CEO, MarketPsych
Question: I'm just keen to see the demo!
3:15 - 3:45PM Introducing LLM Efficiency Gain with S&P Data/Research on LLMs, Utilizing Snowflake AI for Vectorization and a Risk Chat App
How to safely introduce S&P data into your models
Create vector embedding7 of SEC filings natively in Snowflake
Prompt Engineering8: Strategies for refining prompts to elicit desired outputs
Deploy Arctic and Mistral Large LLM's to enable natural language question and answer
Deliver results via a Streamlit chat user interface
Jonathan Regenstein,Head of Financial Services Data Science, Sales Eng, Snowflake
Liam Hynes, Global Head of New Product Development for Public markets, S&P Global
Questions: What unique limitations should be considered when using Snowflake's tech stack for building financial LLMs? What upcoming technologies are planned to address these limitations?
4:00PM - 5:00PM LLM for Alpha Generation – Sentiment Extraction with Snowflake AI and S&P Research on Alpha Generation with LLM Driven Features
Merging traditional NLP techniques with LLMs to enhance efficiency Transforming textual data into numerical features through sentiment extractio
Jonathan Regenstein Head of Financial Services Data Science, Sales Eng Snowflake
Liam Hynes, Global Head of New Product Development for Public markets, S&P Global
Matthew Harris, Sr. Solutions Architect, Snowflake
Question: In what scenarios do traditional NLP models outperform LLMs in managing data insights and outcomes, and why?
DAY 2 AGENDA
MAY 22ND, 2024
Alt Data Content/ Panel Sessions, Lutnick Theatre Intrepid Air and Space Museum
9:00AM Sports & Data: a look at the MLB, F1, NFL & NBA
Where is alt data being used and what type of value is it resulting in?
Which teams are gaining an edge with data?
What’s on the horizon that will be a game changer?
Tim Harrington - MODERATOR, CEO & Co-Founder, BattleFin
TJ Barra, Senior Director, Basketball Research and Innovation, Milwaukee Bucks Inc.
Stephanie Modica, Head of Data Experience & Design, Gemini Sports Analytics
Question: There are already great questions on the agenda. I have a thesis that the NY Yankees have great scouting data science models but awful in-game management models. How would you recommend testing this thesis? (joking… not joking)
9:30AM LLM Spotlight: Understanding the data streams that help generate alpha in the LLM world we live in
Marc LoPresti - MODERATOR, Co-Founder, BattleFin
Kumesh Aroomoogan, Co-Founder and CEO, Accern
Justin Wyman, CRO, Socialgist
Questions: As competition increases and alpha generation from similar data and techniques diminishes, how do you see the evolution of data use in LLMs for alpha generation? Will it eventually mirror the evolution of fundamental data-driven quantitative models facing the same competitive pressures?'
10:00AM Leveraging human insights from Expert Networks and Transcripts to generate alpha
How 1 on 1 consultations can lead to unique insights
Use case examples of how Public and Private equity investors leverage expert networks
What surprises are on the horizon for 2nd half 2024
Rayne Gaisford - Moderator, Chief Data Strategist, Andeco
Daniel Entrup, Co-Founder, AggKnowledge,
Mike Grubert, Managing Director - Head of Credit and Public Equities, Thirdbridge
Ken Sena, CEO & Co-Founder, Aiera
Question: Experts often participate in multiple networks, maintaining painfully multiple identical profiles. In addition, the matching of opportunities to experts can be hit or miss, which may lead to missed opportunities for insights. What technologies could be introduced to improve the ease of participating in and accessing insights from expert networks?
10:30AM VC Panel: Finding The Next Data Unicorn
Marc LoPresti - MODERATOR, Co-Founder, BattleFin
Jack Wyant, Managing Director, Blue Chip Partners
Chris Golden, Founding Partner, Sterling Select Group
Questions: What are your considerations regarding the total addressable market for data companies and their potential business value? Can niche data companies/products achieve economic viability?
10:45AM The Evolution of Research & the Use of Data and LLM's
How has data driven research evolved across Macro, Fundamental & Systematic funds?
Where do blind spots still exist and what are they?
How will the inclusion of AI change the investment process?
Tim Harrington - MODERATOR, CEO & Co-Founder, BattleFin
Casey Webb, Head Of Equity Research Data, Bridgewater Associates
Richard Goldman, Global Head of Sales Strategy & Execution Quantitative Analytics, LSEG Data & Analytics
Uri Knorovich, Co-Founder & CEO, Nimble Way
Question: Great questions for the panel on the agenda; keen to hear the answers.
11:15AM Healthcare Mosaic: Understanding the effect GLP-1 diet drugs will have and a look at potential catalysts in the Gene Therapy market
With over 30 GLP-1's in development, who will win or who will lose out?
Tracking regulatory changes in the biotech and healthcare markets
Joe Peters - Moderator, Head of Data Services, Bridgewater Associates
Graham Lincoln, Managing Director, Head of Product, Kyber Data Science
Beau Bush, President, Ozmosi
Mark Holmquist, Vice President, MarketPulse by SG2
Question: What types of datasets should be integrated to form a comprehensive mosaic for analyzing the impact of GLP-1 diet drugs and the gene therapy market?
11:45AM Real Estate Mosaic: leveraging alt data to identify 2nd half and 2025 catalysts
Analyzing occupancy levels and rental pricing trends
What happens to commercial real estate if rates don't come [down]?
Andrea Hagerman - Moderator Head of Data Sourcing, Chicago Trading Company
Dutch Mendenhall, CEO, RADD Companies
Jonas Bordo, Co-Founder & CEO, Dwellsy
Question: Does the data show areas of strength in commercial real estate? What are the common factors that could show a path forward for the sector?
12:15PM US Alt Data Competition Announcement!
Which teams are gaining an edge with data?
What types of data can add the most value?
What’s on the horizon that will be a game changer?
JUDGES
Tony Berkman, Managing Director, Two Sigma
Stewart Stimson, Head of Data Strategy, Jump Trading
Andrea Hagerman, Head of Data Sourcing, Chicago Trading Company
Dirk Renick, Quantitative Research & Development Lead, Abu Dhabi Investment Authority (ADIA)
Jason Koulouras, Research, Analytics, Intelligence & Data Ranger, Global Market Data Leader Bridgewater Associates
Question: No question from me, other than wondering who will win. That’s a power house judge panel. Good luck to the data companies competing. You’ll learn a lot from their questions and feedback.
What questions would you ask? Leave a comment below.
If you think this is useful for someone attending the conference, please feel free to forward it on.
Like this content and want to get the newsletter straight to your email?
- Jason DeRise, CFA
Alternative data: Alternative data refers to data that is not traditional or conventional in the context of the finance and investing industries. Traditional data often includes factors like share prices, a company's earnings, valuation ratios, and other widely available financial data. Alternative data can include anything from transaction data, social media data, web traffic data, web mined data, satellite images, and more. This data is typically unstructured and requires more advanced data engineering and science skills to generate insights.
Nowcasting: In order to systematically forecast the next reported economic or company-specific financial result, multiple sources of high-frequency data are combined. The model continuously updates the forecast with increasing accuracy as the volume of data covering the unknown period increases.
Alpha: A term used in finance to describe an investment strategy's ability to beat the market or generate excess returns. A simple way to think about alpha is that it’s a measure of the outperformance of a portfolio compared to a pre-defined benchmark for performance. Investopedia has a lot more detail https://www.investopedia.com/terms/a/alpha.asp
Data Lake: A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. https://aws.amazon.com/what-is/data-lake/
Tick history (historical tick data): a record of every trade and quote in a financial market, including the price, volume, and time of each transaction.
Large Language Models (LLMs): These are machine learning models trained on a large volume of text data. LLMs, such as GPT-4 or ChatGPT, are designed to understand context, generate human-like text, and respond to prompts based on the input they're given. It is designed to simulate human-like conversation and can be used in a range of applications, from drafting emails to writing Python code and more. It analyzes the input it receives and then generates an appropriate response, all based on the vast amount of text data it was trained on.
Vector Embedding: a technique in natural language processing that represents words, phrases, or documents as arrays of numbers to capture their semantic meanings and relationships.
Prompt Engineering: Prompt engineering is the process of iterating a generative AI prompt to improve its accuracy and effectiveness. https://www.coursera.org/articles/what-is-prompt-engineering