Allen Institute for AI (Ai2) With the release of its new model training family, Tülu 3 claims to bridge the gap between closed-source and open-source post-training, arguing that open-source models will thrive in the enterprise space.
Tülu 3 brings open source models on par with OpenAI’s GPT models, Claude from Anthropic and Gemini from Google. It allows researchers, developers and enterprises to fine-tune open source models and bring them closer to the quality of closed source models without losing the core skills of data and models.
Ai2 said it released Tülu 3 with all data, data mixes, recipes, code, infrastructure and evaluation frameworks. The company needed to create new datasets and training methods to improve Tülu’s performance, including “direct training on verifiable problems with augmented learning.”
“Our best models are the result of a complex training process that combines component detailing from proprietary methods with advanced techniques and established academic research,” Ai2 said in a statement. Blog post. “Our success is rooted in careful data curation, rigorous experimentation, innovative methodologies and improved training infrastructure.”
Tülu will be available in 3 different sizes.
Open Source for Enterprises
Open source models often lag behind closed source models in enterprise adoption, although more companies report choosing more open source Large Language Models (LLMs) for story projects.
Ai2’s thesis is that improving fine-tuning with open-source models like Tülu 3 will increase the number of enterprises and researchers adopting open-source models because they can be confident that they will perform as well as Cloud or Gemini. can demonstrate
The company states that Tülu 3 and other Ai2 models are completely open source, noting that major model trainers such as Anthropic and Meta, which claim to be open source, “have no training data and Nor are training recipes transparent to consumers.” The Open Source Initiative recently published its first version. Open Source AI Definitionbut some organizations and model providers do not fully follow the definition in their licenses.
Enterprises care about the transparency of models, but many choose open source models not for research or data openness but because it’s the best fit for their use cases.
Tülu 3 offers enterprises more choice when looking for open source models to bring into their stack and fine-tune with their data.
Ai2’s other models, OLMoE and Molmo, are also open source, which the company said have begun to outperform other well-known models such as GPT-4o and Claude.
Other features of Tülu 3
Tülu 3 lets companies mix and match their data while fine-tuning, Ai2 said.
Ai2 said, “Combinations help you balance datasets, so if you want to build a model that can code, but also execute instructions and speak multiple languages, you just need to Select the datasets and follow the steps in the recipe,” said Ai2.
Mixing and matching datasets can make it easier for developers to move from a small model to a model with large weights and maintain its post-training settings. The company said that the infrastructure code it released with Tülu 3 allows enterprises to build this pipeline when moving across model sizes.
The evaluation framework from Ai2 offers a way for developers to see what they want to see out of the model.
Credit : venturebeat.com