
Illarion Iov
Tochka
Large language models are hardly controllable when using standard fine-tuning as it does not allow one to set the desired behavior in production tasks. RL-based approaches, such as PPO, require extensive resources and are challenging to manage. So how can alignment be made cheaper and more efficient?
Our talk will explore alternative preference optimization methods that enable high-quality alignment without excessive costs. We will discuss how we adapted these methods for our own LLM and what strategies were used to train it without relying on large amounts of labeled data. We will cover the best ways to leverage synthetic data, control proxy rewards, and avoid common training pitfalls.
You will learn how to choose the right alignment method for your needs, reduce computational costs, and make LLM training more accessible. This talk will be of most value for ML engineers working with language models and anyone interested in modern alignment methods.
Tochka