Alternative data: insights from web scraping

Richard Johnson of Greenwich Associates talks through the value of alternative data found on the web when it comes to company information

This post can also be found on the Greenwich Associates blog here. ‘Web-scraped data’, or external data found within the trove of information available online, includes a multitude of data sources. While Meltwater’s core is in news and social media data, Outside Insight takes into account multiple sources, and adds value in the form of enrichments and entity extraction to offer investors an information advantage they can use to make forward-looking decisions.

In December, Nasdaq announced they intended to acquire Quandl, an alternative data company. This represents an inflection point for the industry as alternative data goes mainstream.

The early adopters of alternative data were the most sophisticated quantitative hedge funds who had the expertise and resources to take in the often unstructured data and incorporate into their investment models. Now however, usage of alternative data is expanding to more traditional asset managers. Recent Greenwich Associates research shows that 50% of institutional investors plan to increase their usage of alternative data in the coming year. Among the various types of alternative data available, web-scraped data is the most popular.

Note: Based on 40 respondents. Source: Greenwich Associates 2018 Alternative Data Customer Journey Study.


Internet Insights

Web scraped data, as the name crudely implies, refers to data that has been harvested from public websites. The companies that specialize in this type of data collection write programs that access targeted websites and collect and store the scraped information on a periodic basis. In some cases vendors will use public APIs as a way to access the data within those pages directly without visiting the actual website. Vendors in this space include Meltwater, Quandl, Savvr, Thinknum and Yipit.

With 4 billion webpages and 1.2 million terabytes of data on the internet there is a mountain of information that can be valuable to investors. Types of web scraped data include:

  • Job Listings: A company that is increasing hiring and headcount is likely experiencing growth.
  • Company Ratings: Sites like Glassdoor allows employees to rate their company; increasing ratings, especially (in conjunction with increasing job listings) can be another growth indicator.
  • Online Retail Data: High product rankings on online retailers suggest strong sales for those product manufacturers. On the flip side, heavy discounting of products suggest weak sales.


Example: The Information Advantage

On December 26th Amazon ($AMZN) announced that they had had a record holiday season, including:

  • “Echo Dot was the #1 top-selling product across all categories on Amazon”
  • “Echo Dot and Fire TV Stick…were not only the top-selling Amazon devices this holiday season, but they were also the best-selling products…across all of Amazon”
  • “Millions of Prime members voice shopped with Alexa for gifts, Amazon devices and everyday household essentials”

Amazon’s success over the holiday season clearly wasn’t yet factored into the stock price which has soared by 23% since the announcement while the S&P 500 remained roughly flat.

For users of alternative data though, the strong performance of Amazon products wouldn’t have necessarily been a surprise. Data below provided by Thinknum sourced through Best Buy’s website shows the increasing sales strength of Amazon products through the holiday period starting from Black Friday.

Source: Thinknum

In addition, also from Best Buy’s website, we can specifically see that the Echo Dot and Fire TV Stick were top sellers in their category, as noted in the official Amazon press release.

Source: Greenwich Associates and Thinknum, Dates 12/20/2018 - 12/25/2018.

And the importance of Alexa, Amazon’s AI powered, voice-controlled digital assistant, could have been seen months earlier by looking at data for ‘Amazon Alexa’ job postings on Amazon’s corporate website. Over the course of 2018, the number of open positions related to Alexa increased by 53% from around 500 to over 750.

Source: Thinknum

Of course, alternative web-scraped data doesn’t have all the answers. While it may indicate which products are selling well, it doesn’t quantify the impact that may have on a large diversified company like Amazon. And although an increase in job listings is an indicator of growth, it could also mean an increase in cost which could impact profitability.

However, it is clear that alternative data such as web scraped data can provide important new information about a company’s business and outlook. Sometimes this data has value on its own, and sometimes true value is derived when combined with other data sources, both traditional and alternative, and qualitative analysis.

Recent Articles