Building a knowledge graph platform from unstructured online sources

Meltwater’s CTO Aditya Jami tells us how the team is building a company-focused knowledge graph using advanced AI and NLP technology

“There are a lot of breadcrumbs out there. We can systematically mine them for insights that can help investors and corporate leaders make more context-driven decisions.”

Aditya Jami CTO, Meltwater

Why do we need AI to make sense of external data?

We have more than 1.3T documents in our corpus and are adding 700M every day. All of this external data is very unstructured, so we need data mining techniques to structure it and find interesting patterns.

We need to make sure the data is clean, normalized and easily accessible for our data scientists. Data can be very deceitful. There is a lot of contradictory or incorrect data out there, so we need a way to extract knowledge from this data in a factual way. To do this we assign confidence to everything that we mine and help us better connect the dots.

What we’re looking for in this data are signals, which are forward-looking indicators. This requires systematic mining. If you just monitor data, you will simply understand what has happened. But if you want to turn this information into a forward-looking signal, you need to connect the dots across different data types and insights. So we need some form of machine learning and reasoning.


Constructing a knowledge graph built around company information

We’ve created a lot of data enrichment services and insight services. At the heart of it all is the knowledge graph.

What is a knowledge graph?

We bring in a number of different data types – news, blogs, forums, job posts, social, etc. We index all this data around key entities – these for us are organizations, people, products and brands.

We extract entities, attributes, relationships, events and trends, and some bare minimum analytics that give us a basic understanding of what’s happening around these entities, and form predictions over time. We also connect them so we have an easy way to start reasoning around the analytics we’re building on top of the platform. That’s where the knowledge graph comes in.

Knowledge Graph backed Platform | Source: Meltwater

Why organize as a knowledge graph?

We care about providing intelligent insights about companies, products, brands, key decision makers and more. So we need to use AI to identify these entities, disambiguate them and identify the relationships between them. We want to map out all the known entities we care about and understand what sort of key events and relations exist within all the unstructured data.

We embed all the factual knowledge that we extract from the web into a structural knowledge base. If we want to understand reasoning behind the insights, to start observing causality and correlations, and actively try to explain why an event has occurred and why it is relevant, we need to have intelligent reasoning capabilities.

We start with a focus on indexing a lot of data. The idea is to get a semantic understanding of how the data is organized. On top of this we can build interesting applications or layers on the platform. For instance, recommendation engines, contextual searches, semantic searches, and inferencing engines that help the APIs on top of the platform to become more intelligent.

The result is a closed domain knowledge graph built around companies and their related entities.

Importance of organizing as a Knowledge Graph | Source: Meltwater

How does it work?

It starts with a lot of external data. This data goes to our NLP stack, and everything gets dumped into our data platform, where we create the knowledge graph. We have layers which do a lot of graph embedding, graph aggregation, and graph convolution and reasoning engines. The hybrid inference is where we’re generating insights from underlying data.

We are the world’s largest corpus of external data around the entities we care about. We have data of 3 types:

  • Licensed and compliant data (Twitter firehoses, etc)
  • Open web data – use AI tech to automate induction of wrappers
  • Companies push their own data to augment external data and derive company insights

Sources of data:

  • Web data extraction – academic web
  • Annotations – site microdata
  • Web tables
  • Information extraction from text that we capture

The knowledge graph is temporal in nature. It’s the difference in the information sets that we get that give us the insights. To understand what has actually changed over time you need this temporality aspect. That’s where we find key predictive signals. Continuous extraction is not just a requirement, but it’s necessary for most of our insights.

The result

The knowledge graph powers our ability to surface forward-looking signals, which can be leveraged to make better investment decisions, understand the trajectory of a particular company, product or individual and better predict changes in the market.

To learn more about Meltwater’s Outside Insight, get in touch at

Recent Articles