If you’ve used ChatGPT search or Perplexity, you know that being able to search the web and get inline references greatly improves these AI chatbots. Results are better when they include timely information, and a web search can. Reduce so-called hallucinations (i.e. when a creative AI outputs false information).
That’s why French startups Link up is building an API that lets developers access web content from premium, trusted sources and feed the results to a large language model (LLM) to improve its responses. Many AI developers call this workflow Retrieval-Augmented Generation (or RAG).
More importantly, the future of scraping bots is uncertain. If there isn’t a pre-existing financial agreement between content publishers and web scraping entities, these bots are picking up content from the open web for free and many people aren’t happy with the agreement – which AI training is increasing regulatory scrutiny around
There are also high-profile legal cases in the frame now, such as the ongoing lawsuit between OpenAI, the maker of ChatGPT, and the New York Times — so the situation around web scraping may change in the near future. Hence why OpenAI has signed multi-year content licensing deals with major publishers such as AP, Axel Springer, Condé Nast, El País, Financial Times, Le Monde, and others.
“We founded the company when OpenAI was doing deals with news sources … for training or estimation purposes, to augment the responses of OpenAI models and their products. And we thought: ‘Okay. , it’s great because we finally have AI companies that pay from their own sources,'” Linkup co-founder and CEO Philip Mizrahi told TechCrunch, adding that the founders are connected to AI devs. For setting up a business forced – hopefully – content providers for their mutual benefit.
Currently, content publishers face a difficult decision about what to do about GenAI’s thirst for data. They can block web scrapers using the (illegally bound) robots.txt metadata file (which indicates whether a website can be used to train an AI model or not). Additionally, they can sue AI companies that they believe have infringed their copyright. Alternatively, they can let bots index their content independently (er, YOLO?) or they can license content to AI devs to get some compensation for their intellectual property.
But there are thousands of AI companies (or tech companies using AI) that don’t have the scale and reach of OpenAI. At the same time, the best thing about the web is that there is a long tail of content publishers. But this means that a small content publisher usually doesn’t have the financial resources to file a lawsuit. This also means that it will be difficult for millions of websites to switch from a scraping model to a licensing model.
That’s why Linkup is not just a technical solution. This is a market; An intermediary between content publishers and companies that want to enhance their LLM responses with web content.
Linkup signs content licensing deals with publishers and integrates with their CMS so it can get content from publishers without scraping. Linkup then pays content contributors based on how often their content is accessed by Linkup clients.
“We’re really targeting applications that are implementing AI into their products,” Mirzahi said. “So, a typical use case is that I build an AI application using Mistral or OpenAI’s model. I build my own pipeline, but I need to enrich that pipeline with external information.
As a side note, while ChatGPT can browse the web, GPT models cannot. OpenAI provides both a widely popular application (ChatGPT) and LLMs that developers can use with an API (GPT). But web search is a ChatGPT feature.
“There’s an example that I like, one of our customers … created an internal application for their sales people,” Mizrahi also told us. “On the one hand, they list all the benefits of their products. And thanks to us, they get fresh, quality information about their prospects and put it into Mistral LLM. And Mistral’s LLM for Sales Reps is going to create a kind of sales pitch, which will be in front of them when they call with customer leads.”
At first, Linkup decided to focus on corporate and business information. In addition to news websites, the startup works with knowledge databases — think Statista, Xerfi or other resources in the same vein.
It’s not the only startup working on bringing premium content to LLMs with behind-the-scenes licensing agreements. is the most visible competitor. Scale postthat’s a start Works with Perplexity. To expedite our licensing deals with publishers.
Linkup raised a €3 million seed round ($3.2 million at current exchange rates) a few months ago from Axeleo Capital, Motier Ventures, Seedcamp, and a hundred business angels. The startup currently employs about 10 people, and it plans to hire 10 more staff over the next year.
Credit : techcrunch.com