Lay of the Land for LLM application building

So many chatbots, everyone wants LLM’s in their dev stack. Here are some thoughts and strategies learned (the hard way!) from building LLM applications over the past year.

Optimizing Data Pre-processing

The integration of smart web crawling and webpage parsing in LLM applications offers significant advantages for RAG context and agent-based AI systems. Naive web retrieval is easy but becomes all about the details to build something useful. By optimizing how data is crawled, cached and ingested, developers can improve the efficiency of context-aware applications. This includes implementing parallel processing strategies to handle web scrapes, running multiple instances concurrently to reduce overall processing time as well as effective html parsing, including html-to-text or better html-to-markdown. Such optimizations can extend the effective use of web data for vector database population and live/direct RAG context injection. Speed and quality of content are crucial for enhancing the responsiveness and accuracy of LLM applications.

Enhancing Custom LLM Applications with Domain-Specific Strategies

Working on large-scale LLM projects developers are often required to tailor LLM applications to specific domains. This necessitates the creation of custom data ingestion strategies that accommodate unique content types and languages. You need to consider multi aspect embedding models and infrastructure that is effective and ideally in the edge in production. Also, someone should develop declarative tooling for agent workflows that can streamline the creation and management of custom agents, reducing the technical load and expediting development cycles. Tools like LangGraph could be integrated to facilitate these processes, mirroring advancements like what Ludwig has done for model fine-tuning.

The Importance of Advanced UI/UX and Data Storage Solutions

For LLM applications, the user interface (UI) and user experience (UX) play pivotal roles in their adoption and effectiveness, simple chatbots alone won’t cut it. Vercel despite the annoying and highly opinionated ai/rsc in their AI SDK v3 is pushing the limits on AI devx. Furthermore, the choice of (vector) database solutions can significantly impact the functionality of LLM applications. So many flat vector db solutions out there (LangChain has 94 integrations!), but Supabase is emerging as a frontrunner for llm developer startups by offering managed PostgreSQL with extensions like pgvector with now LLM’s (embeddings) in serverless edge functions, enhancing the application's performance, customization and scalability, but also the UX.

Integrating Advanced Embedding Models and Vector Search

The integration of multi-aspect embedding models, or even 8bit/binary embeddings in order to scale to 100s million vectors without a crazy monthly storage bill, can enhance the capabilities of LLM vector searches, especially in mixed data scenarios with more nuanced interactions and responses from the LLM.

Fit-for-pupose (L)LM Selection

Use the right model for the right purpose, stop using GPT-3.5 (or worse GPT-4) for everything there are so many out there, more efficient and more effective. Considering fine-tuning when needed, especially for SML’s in Agentic flows. Consider LLM applications at scale (that’s when you realize you can’t afford running GPT-4 on everything both inference time and cost. Consider again the UX for inference time of any model or serverless inference provider options, things like throughput and TTFT matter!

Multilingual and Machine Translation

A short note on this, having worked on an LLM Translation tool for over a year based on the latest research in advancing Machine Translation. The bottom line is that multi-lingual is still far from where it should be and english LLMs still dominate. You can’t fine-tuning a 7b english model with an english language tokenizer like llama2, on arabic or amharac datasets (assuming you have them) and expect it to work well. Aya by CohereForAI is a great starting point, but still young.

Document Processing and Chunking

Document ingestion, particularly parsing PDFs, remains a challenge due to the cost and complexity involved, LlamaParse did it, but at a cost. Implementing a highly opinionated option for noise reduction during webpage (html) data ingestion might also be beneficial. Chunk in markdown text over simple text were possible if used for RAG contexts or unstructured data. Chunking techniques are evolving with semantic chunking being the latest flavour. However, chunking and ingested-data structuring depends entirely on the use-case. As mentioned with the challenge of flat standard vector databases, we sometimes want to create a more relationships within the data, so PostgreSQL like Supabase or Neon might be a better fit over closed-box Elastic, Pinecone, Qdrant, especially as you will need to fine-tweak your DB retrieval functions for hybrid or other search.

Utilizing LLMs for Knowledge Graph Generation and Agentic Flows

Finally, the generation of knowledge graphs from unstructured web/other data as well as Agentic data flows represent two frontiers for LLM applications. By parsing data into structured graphs and formats, developers can create rich, interconnected data sets that enhance the analytical capabilities of LLMs. Although this is domain-specific and can be complex, the potential for developing generic approaches to knowledge graph generation is vast and could significantly impact how unstructured data is leveraged in combination with classic text vector stores. Agentic AI is all the buzz too, and LangGraph is ahead with their concepts. We need to simplify it and make it more declarative.

Conclusion

The development of LLM applications is a dynamic field that requires continuous innovation and adaptation. By leveraging tools for optimized data organization, ingestion and retrieval, employing advanced embedding models for enhanced data interaction, and innovating in UI/UX and database management for AI use, developers can create more robust and efficient applications. As the technology matures, the possibilities for its application will expand, promising significant impacts across ever domain.