360° Data Collection

Different data sets for different insights

You never know where the next great idea, company, or technology may come from. This is why Mergeflow collects and analyzes data from across various disparate data sets and sources. As a result, you are aware of what's going on around you, and you get a 360° perspective.

Scientific Publications

Scientific publications help you identify experts and leading organizations in a given technology field.


Patenting activity in a certain technology field can be an early indicator of future commercial activities in this area.

News & Blogs

News and blogs provide new ideas, forecasts, and scenarios, often from an unconventional angle.

Market Research

Market research provides size and growth estimates, connects technologies to applications, and identifies major players.

Venture Capital

Updates on venture investments from around the world help you discover new, market-relevant technologies and business models.

Technology Transfer

Technology transfer offerings from universities and R&D organizations worldwide help you strengthen and speed up your own R&D projects.

Research Projects

Funded research project reports help you discover innovative, early-stage companies that would otherwise remain under the radar.

Clinical Trials

Use clinical trials to discover the latest applied medical research, and the companies and organizations behind these advances.

"The good researcher could not act in academic isolation. In order to succeed, they must know something about markets, finance, and the organization of a business."

From G. Pascal Zachary, "Endless Frontier: Vannevar Bush, Engineer of the American Century"

Background image of Vannevar Bush's differential analyzer, from Wikipedia.

Smart crawling

A crawler is software that collects data from the web. It starts off with a web page that it collects into a database, then follows the links from that page to other pages, and collect those too. Then it moves on to the next set of pages from there, and so on.

Mergeflow's crawlers pay attention to content. Rather than simply following and collecting every available page, they use self-learning web scraping technologies that recognize and collect contents that are relevant to technology and business.

Selective content collection

In order to decide whether or not to collect a page, Mergeflow's crawlers use methods such as language detection (we try to collect English contents); semantic topic modeling to distinguish relevant contents (technology descriptions etc.) from irrelevant ones (directions etc.); and text layout modeling. For example, if a page looks like a list of items or like driving directions, Mergeflow ranks it lower than if it looks like body text with tech and business contents.

selective content selection crawler decision tree
Document structure detection

Once our data collection systems have collected a web page, a science article, or some other content, they only feed relevant parts of that document into our analytics pipeline. For example, they extract data from the title or body but not from disclaimers or other irrelevant parts. In order to do this, we have built an unsupervised algorithm for document content extraction that learns from just a few examples how a data source typically structures its contents.

document structure detection schema

Start 360° discovery

7-day free trial. Full access to all Innovator Plan features. No credit card, no commitment.
Please fill out all fields.*
* We will not sell your information to marketers and will only collect and use your information in accordance with our Privacy Policy.

You might also be interested in:

Ideas & players

Move into uncharted territories.

Discover ideas & players

Markets & investments

Business data from texts, made accessible.

Discover markets & investments

Emerging technologies

24/7 disruptive innovations discovery.

Discover emerging technologies
© 2021 Mergeflow

Unsupervised learning algorithm

Unlike supervised learning algorithms, unsupervised learning algorithms do not require labeled data. Instead, they try to find an optimal grouping of data by considering various attributes of the data. In our context, the algorithm classifies each document part as either "relevant" or "not relevant", based on a set of data attributes.