Efficient AI: The case for granular fine-tuning, specialized models, and agentic frameworks
In building AI applications, we have primarily focused on using increasingly large foundation models capable of handling a wide range of tasks. However, this approach presents significant challenges in terms of computational costs, energy consumption, and task-specific efficiency. This article argues for a strategic shift towards more granular, low-code solutions for building fine-tuning pipelines, particularly for embedding models and context-aware language models (LLMs), as well as the adoption of agentic AI frameworks with smaller, specialized actors.
The need for accessible fine-tuning tools
Current AI development still relies on complex, code-heavy processes that limit accessibility to those with specialized knowledge. We are seeing more and more tools (see below), but its not enough. To democratize AI development and improve efficiency, we need the creation of even more low-code or AI-generated code solutions for fine-tuning pipelines. These tools should cater to two primary areas:
- Embedding Models: Fine-tuning embedding models can significantly enhance the performance of information retrieval systems. By allowing domain experts to easily adapt these models to specific contexts and languages, we can create more accurate and relevant AI assistants.
- Context-Aware (S/L)LMs: Developing LMs that can adapt to different roles and contexts (e.g., from high-level planning to very specific low-level tasks) is crucial for creating versatile AI systems. Low-code tools for fine-tuning these models would enable the creation of more specialized, efficient AI agents.
Dataset quality and curation
The effectiveness of fine-tuning heavily depends on the quality of the training data. This is often the biggest barrier to fine-tuning for those new to it, and the reason the fine-tuned model sometimes sucks. And.. there aren’t enough data scientists (or not every organization can afford one). To address this, we propose the development of dataset cleaning LLM agentic frameworks that can:
- Identify and correct issues in training data
- Distinguish between human-generated and synthetic contributions
- Facilitate the review and refinement of datasets for fine-tuning
These frameworks should be designed with user-friendly interfaces, allowing non-technical domain experts to participate in the data curation process.
Moving beyond Large Foundation Models
While large foundation models have demonstrated impressive capabilities, they often prove inefficient, costly for specific tasks and highly unsustainable for the planet. We argue for a transition towards smaller, specialized models that are:
- More cost-effective to run at scale
- Environmentally friendly due to reduced computational requirements
- Tailored to specific applications, improving task-specific performance
This approach involves:
- Utilizing fine-tuned embedding models to enhance retrieval performance
- Implementing context-aware techniques to improve the efficiency of smaller models
- Leveraging high-quality, curated datasets to maximize the performance of specialized models
Embracing ‘agentic’ AI Frameworks
A crucial aspect of this transition is the move towards more agentic AI systems. Instead of relying on monolithic, large language models that process extensive text inputs, we propose the development of AI application frameworks composed of smaller, specialized actors. This approach offers several advantages:
- Modularity: Smaller, task-specific AI agents can be combined and reconfigured to address a wide range of applications, providing greater flexibility than single large models.
- Efficiency: By using specialized actors for specific tasks, we can reduce computational overhead and improve response times.
- Scalability: Agentic frameworks allow for easier scaling of AI systems, as individual components can be updated or replaced without disrupting the entire system.
- Improved context handling: Smaller actors can be designed to excel at understanding and operating within specific contexts, leading to more accurate and relevant outputs.
- Reduced data requirements: Specialized agents often require less training data to achieve high performance in their specific domains, making them more data-efficient than large, general-purpose models.
Conclusion
By shifting focus towards granular fine-tuning, dataset quality, specialized models, and agentic AI frameworks, we can address the current true limitations of large foundation models in production applications and their impact on the planet. Future work should prioritize the creation of low-code tools, frameworks for agentic AI, and methodologies that enable a wider range of professionals to contribute to and benefit from these advancements in AI technology.
Appendix: List of known fine-tuning services
(In no particular order)
- Hugging Face - Provides tools and resources for fine-tuning various LLMs
- Predibase - Offers LoRA Exchange (LoRAX) for efficient deployment of fine-tuned models
- Unsloth - Provides easy fine-tuning capabilities, especially for smaller models
- axolotl - An open-source tool for easy fine-tuning of LLMs
- Monster API - Offers no-code LLM fine-tuning with a simple interface
- Azure OpenAI Service - Provides fine-tuning capabilities through Azure OpenAI Studio GUI or APIs
- Entry Point AI - A modern fine-tuning platform for proprietary and open-source LLMs with no code required
- H2O LLM Studio - A no-code graphical user interface for fine-tuning LLMs
- aiDAPTIV+ - Provides cost-effective fine-tuning solutions for small and medium businesses
- Replicate - Offers a serverless interface to run and potentially fine-tune various models
- DataCamp - Provides tutorials and resources for fine-tuning LLMs
- Rapid Innovation - Offers services for creating and fine-tuning language models tailored to specific tasks or industries
- Cohere - Offers Command R fine-tuning for enterprise use cases
- OpenAI - Offers fine-tuning services through their API