Alternative data is a buzzword, yet it wields enormous potential not only for investors and businesses, but also for governments and regulators. It is the result of our unprecedented data revolution, yet it is obscure and hard to find, although it is hiding in plain sight.
Some believe that by 2020 (in little less than one year’s time) each individual will generate around 2 MB of data each day, with a significant proportion of this data created in readily analyzable and easily accessible digital format. The insights that data can create in the future are still difficult to imagine; but with increased quantification it’s likely that most decisions will be driven by data exclusively.
To understand how alternative data as value, let’s look at the sources of this digital data before discussing the various data sources – and their proven uses – in detail.
Sources of alternative data
Alternative data can be data extracted from
- Non-traditional data sources, such as digital exhaust or digital residuals from web traffic or logistics data to quantify shipping activity through a supply chain,
- Unstructured social media information that can be quantified easily before further transformed using various computational methodologies,
- Aggregated transaction information (such as credit card data),
- Remote sensing data, such as satellite observations and digital tracking exhaust from digital devices, such as web-searches and cell phone usage information
Harnessing the information content of this enormous data requires computing power. With close to 2.5 exabytes of data are created daily, large storage, processing, computing and analytical power is needed. And, with more data created each day, doubling every 40 months or so, the amount of data — some good and some bad — becomes significant.
Structured and unstructured data
Some alternative data is structured, while other is unstructured.
- Structured data is clearly defined data whose pattern makes them easily searchable, which makes aggregation and analysis easier.
- Unstructured data is usually not as easily searchable, unless further refined and tagged or labeled, and typically includes various audio, video, and social media postings.
- Unstructured data can be transformed into structured data and piped directly into analytics platforms.
- Using proprietary algorithms – often machine learning process – to extract information, feature engineering various signals, and combining different data sources to enhance the relationship between various unstructured signals, the hereto unstructured data can be integrated into structured data. The challenge is to refine the data source and to reduce the signal to noise ratio — often dynamically, contemporaneously, and transparently — and to ascertain that the inherent information quotient in the unstructured data is enhanced to meet the users’ various demands.
Different sources of alternative data
Alternative data generated by individuals
Through their social media activity, online reviews and web searches, individuals are the major producers of alternative data, particularly thorough their continuous social media posts and similar information. With over 1.56 billion daily active users on Facebook (on March 2019) and 2.38 billion monthly active users on Facebook (March 31, 2019), users generate enormous amounts of data through posts, comments, shares, and re-shares. Patterns, contacts, and networks create an added layer of information.
Similarly, Twitter has a strong following as well: there are close to 321 active million users (February 2019), who provide insights into how people think on the fly and express themselves contemporaneously in front of friends and the world. Because this information is personal, immediate, and deliberate, it represents a richer perspective into the inner workings of human nature and psychological trends than news and media.
Individuals generally create three types of alternative data:
- Social media postings, where there is different level of of engagement, quality, and content depending on the social media platform (Instagram vs Kik vs Tinder), the intended use and audience of the social media post (Tumbl vs Yelp vs RateMyProfessor), the general message (Twitter vs. WeChat)
- News and reviews, ranging from Amazon-type reviews (Netflix, IMDB, DPR) to mummy mafia blogs
- Web searches and personal or personally identified data, such as Google, Bing, and Weibo search information (or GitHub and PornHub)
Alternative data generated by business processes
Business processes are often designed and planned, occur in structured manner, and contain information with high signal to value ratio. Specifically, exhaust data refers to data that is a by-product of corporate activities including supermarket scanner data or supply chain data. It is widely believed that aggregated credit card transaction data offers the most reliable indictor that provide insights on price formation, inflationary expectations, and product-level profitability, and act as a leading indicator of revenues as well as determinants of profitability
Corporations generally create three types of alternative data:
- Transaction data — credit card, invoices, and supply change information.
- Corporate data — corporate filings, including official filings (regulators such as SEC, FDI; patents; newswires, websites, and blogs), marketing materials, presentations published on social media (YouTube).
- Government agency data — patents, monitored trials (FDA or similar), regulatory activities (EPA, DOE, or similar), tax information (IRS or similar).
There are many data distributors that actively gather transaction and other corporate data to anonymously aggregate such information for further processing and refinement. It works: GoPro shares dropped in November 2016 when product receipts from more than 3 million email inboxes were analyzed and the results indicated that sales volumes declined at major points of sale for the product.
Alternative data generated by sensors
Sensors collect images from satellites and monitor movements using apps on mobile phones, other devices such as CCTV, and the IoT (internet of things). There are companies specialized in quantifying and grouping various types of ships at ports or along important international shipping routes, mainly at the four major maritime transhipment and choke points. Real-time access to data on ship and plane movements generate comprehensive picture of global economic conditions.
Mobile apps (on digital devices) generate geolocation intelligence for consumers and businesses. And, IoT devices that automatically monitor activities can track human movements, precipitation information and other regular patterns. Traffic from wireless and mobile devices — particularly in the emerging markets — is increasing rapidly, and will likely generate more insights about behaviors than what we now have.
Sensors generally generate three types of alternative data:
- Satellites — parking lot information for malls and manufacturing
- Geolocation — who, where, and how long
- Other sensors — machines, temperatures, and (CCTV) cameras
Finding value
The value of data comes from its use. As hedge funds, institutional investors, traders, and others are looking for their unique edge to improve alpha, alternative data generated through digital activities is emerging as the new oil. Most of the value added by alternative data enhances traditional data; however, as the signal to noise ratio improves, alternative data is likely to become the major source of information.
This article was originally posted on the Toward Data Science Medium page by Peter Went a leading voice in strategically navigating the crossroads of quant analytics, management, and regulation.