Data Deep Dive: Google Trends, Part II
Google search data is widely accessible, but few understand how to use it correctly or the rigor required to ensure its reliability. We continue the deep dive inside.
Welcome to the Data Score newsletter, composed by DataChorus LLC. The newsletter is your go-to source for insights into the world of data-driven decision-making. Whether you're an insight seeker, a unique data company, a software-as-a-service provider, or an investor, this newsletter is for you. I'm Jason DeRise, a seasoned expert in the field of data-driven insights. As one of the first 10 members of UBS Evidence Lab, I was at the forefront of pioneering new ways to generate actionable insights from alternative data. Before that, I successfully built a sell-side equity research franchise based on proprietary data and non-consensus insights. After moving on from UBS Evidence Lab, I’ve remained active in the intersection of data, technology, and financial insights. Through my extensive experience as a purchaser and creator of data, I have gained a unique perspective, which I am sharing through the newsletter.
This article continues the data deep dive covered in Part 1. After covering the common questions addressed with the data and going deep into the underlying data, we continued the deep dive format. There’s much more to consider beyond the underlying data related to cleansing and enriching the search data.
Google Trends is often used as an alternative data1 source. But I get frustrated when I see Google Trends misused to support arguments. The data itself isn’t flawed; rather, it’s the misunderstanding of the data that leads to misinterpretation and poor application.
Sponsored Advertisement
Exabel published their latest US Consumer Sector Report as of July 2024, which includes data from Facteus, Flywheel Alternative Data, Revelio and newly added Apptopia, Datos, and Placer.ai. The combination of transaction data, web-scraped pricing and demand data, clickstream data, foot traffic data, and app usage data provides multiple angles on the health of the US consumer sector, covering 80+ companies. You can sign up using the link below to receive monthly updates. https://www.exabel.com/us-consumer-sector-report/
The use case for discussion was sparked by a well-known YouTuber in the music industry, who recently used Google Trends to argue that music is not as good as it used to be and that music is becoming less popular. I thought this question of the popularity of music compared to other activities provided a great low-key topic to explore via Google Trends, using the best practices for getting quality results. It also helps contrast good and poor uses of Google Trends. Check out Part 1 for more context.
In the Dataset Deep Dive Series, the format covers the following topics:
Common questions addressed with the data (Covered in Part I)
Underlying Data (Covered in Part I)
Cleaning the Data (Part II)
Enriching the data (Part III)
Limitations to consider (Part III)
Action items to begin using the data (Part III)
In the original publication of Part I, I planned on completing the deep dive in Part II. But the level of detail needed to cover data cleansing required its own dedicated deep dive. I’ve decided to leave the data enrichment, limitations and action items to begin using Google Trends data to Part III.
Part II is going to go very deep into the rigor needed to cleanse the raw data to enable trusted insights powered by Google Trends data. While very specific to Google Trends, it is important to note that high-quality alternative data sets of any type go through this type of deep work because most alternative datasets are are very messy in their raw state. Like Google Trends, the data is typically pivoted from its primary use case to address investment questions, which means it is not a perfect fit for use right out of the box. Understanding the details and techniques to clean raw alternative data is not only important for understanding the integrity of the data but also for having a deeper understanding of appropriate use cases.
