Meta v. Bright Data Perspectives: A Data Score Interview with Glacier Networks
The Data Score interviewed Glacier Network’s founder and CEO, Don D'Amico, on the implications for the alternative data industry following the Meta v. Bright Data court decision.
Welcome to the Data Score newsletter, composed by DataChorus LLC. The newsletter is your go-to source for insights into the world of data-driven decision-making. Whether you're an insight seeker, a unique data company, a software-as-a-service provider, or an investor, this newsletter is for you. I'm Jason DeRise, a seasoned expert in the field of data-driven insights. As one of the first 10 members of UBS Evidence Lab, I was at the forefront of pioneering new ways to generate actionable insights from alternative data. Before that, I successfully built a sell-side equity research franchise based on proprietary data and non-consensus insights. After moving on from UBS Evidence Lab, I’ve remained active in the intersection of data, technology, and financial insights. Through my extensive experience as a purchaser and creator of data, I have gained a unique perspective, which I am sharing through the newsletter.
This edition of the Data Score Newsletter continues our Q&A series, this time focusing on regulatory, legal, and compliance themes, which are critical to factor into any alternative data1 practice.
This and future data provider interviews won’t be paid placements (it is made clear when the text is a sponsored ad) or act as an endorsement of one data vendor over another.
This article is not investment research nor is it legal advice.
Interview Focus: Meta v. Bright Data implications
On January 23, 2024, Bright Data, an Israeli tech company, won a U.S. Federal Court case against Meta over allegations of data scraping from Facebook and Instagram. Meta, which previously used Bright Data's services, accused the firm of violating its terms of service by harvesting data. The court, however, found little evidence to prove that Bright Data harvested private data. The court dismissed Meta's breach of contract and illegal data collection claims by differentiating public data scraping from accessing private data, emphasizing that public data should remain accessible. For more details, you can read an article on TechCrunch here.
We reached out to Don D'Amico, founder and CEO of Glacier Network, to get his takeaways from the ruling. Don can be found on LinkedIn at https://www.linkedin.com/in/donalddamico/ and Glacier Network’s website is found at https://www.glaciernetwork.co/.
The interview
The Data Score: How has the court ruling in Meta vs Bright Data influenced your thinking on best practices in web mining governance2 and compliance?
Don D'Amico: The Meta case is no doubt a positive uptick for scraping, although it is somewhat incremental (and may attract more defenses against scraping soon). The case does offer some comfort that bypassing anti-bot and CAPTCHA3 software may be permissible.
That said, as many others have discussed post-hiQ4, the breach of contract claims that define this litigation are highly specific to the individual contracts for the sites that are scraped. And these terms can typically be amended unilaterally or on short notice, so best practices should include a regular cadence for reassessing each scraped site's terms or imposing controls on contractors like Bright Data to ensure that they comply with terms as of the date of their activity.
Also, keep a log of your scraping activity (detailing what sites are approved and when) - this case demonstrated the value of records as evidence.
The Data Score: What key considerations should individuals have in mind before initiating their first web mining project with web robots?
Don D'Amico: This analysis remains largely the same. In the first instance, one should carefully consider each site's terms, the content scraped (whether it may be copyright protected, etc.), and any intended usage of the scraped data for competitive vs. non-competitive (e.g., investment, academic, etc.) purposes. Practically, can useful, lawful content be extracted without logging in?
The Data Score: What upcoming court cases should the industry watch as the next litmus test for what the appropriate web mining governance standards should be?
Don D'Amico: For one thing, Meta can follow up on its complaint against Bright Data with additional claims. X Corp. also has a case pending against Bright Data, again on alleged breach of contract grounds addressing conventional issues in contract formation and interpretation.
I think it will be interesting to see if there are cases outside of California (particularly in New York) involving companies outside of social media. There are also several copyright cases related to AI that seem broadly relevant to scraping.
The Data Score: What is Glacier Network’s framework for accessing data governance5 and compliance standards as the data industry continues to rapidly evolve?
Don D'Amico: Glacier is set to promote a standardized risk policy for data vendors, which we made available last year as a trial. Several prominent vendors signed on early in support of that policy.
Our goal is to simplify the process of exchanging (alternative) data, such that organizations do not endlessly pile on additional information requirements and contracting terms as the law develops. We believe that for both parties to a data transaction, time is of the essence. Check out glaciernetwork.co if you're interested in learning more.
Concluding thoughts
A big thank you to Don for sharing his time, effort, and insights. If you’d like to learn more, you can reach out to the team at Glacier Network on info@glaciernetwork.co or check out Glacier Network’s LinkedIn page at https://www.linkedin.com/company/glaciernetwork/
Also, feel free to leave comments and questions below the article too.
Feel free to share the article with anyone who would find it useful.
For more content like this and to see future interviews, subscribe to the Data Score Newsletter.
- Jason DeRise, CFA
Alternative data: Alternative data refers to data that is not traditional or conventional in the context of the finance and investing industries. Traditional data often includes factors like share prices, a company's earnings, valuation ratios, and other widely available financial data. Alternative data can include anything from transaction data, social media data, web traffic data, web-mined data, satellite images, and more. This data is typically unstructured and requires more advanced data engineering and science skills to generate insights.
Web Mining (Web Scraping): The process of using automated software to extract large amounts of data from websites.
Web Mining Governance: The set of policies, procedures, and standards that guide the ethical and legal collection, analysis, and use of data from websites.
CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart): Tools used by websites to distinguish human users from automated scripts or bots.
hiQ Labs, Inc. v. LinkedIn Corp.: Reference to a landmark legal case that addressed the legality of web scraping publicly available data. Wikipedia page on the case: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn
Data Governance: The overall management of the availability, usability, integrity, and security of the data employed in an organization. “Data governance is everything you do to ensure data is secure, private, accurate, available, and usable. It includes the actions people must take, the processes they must follow, and the technology that supports them throughout the data life cycle.” - Google Cloud’s definition: