Building LLM-Based Infrastructure from Scratch

In RussianComplexity -

In this talk, we will thoroughly explore the infrastructure needed to leverage large language models (LLM) from the ground up, drawing on the experience of X5 Tech.

I will begin by explaining what LLMs are and why they are becoming increasingly important in modern technology. We will discuss the key components required to create a scalable and reliable infrastructure. We will compare three popular backends for LLM inference: llama-cpp, TGI, and vLLM, highlighting their advantages and disadvantages. Special attention will be given to the pitfalls of llama-cpp, and we will evaluate whether vLLM is indeed the ideal solution. I will also touch upon the topic of information retrieval and its connection to LLMs, explaining how these models can enhance knowledge base search processes.

In conclusion, we will discuss how to force LLMs to generate high-quality text, based on our experience implementing a chatbot for employees of X5 Group.