Model Merging
Model merging is an efficient alternative to fine-tuning that leverages the work of the open-source community. It involves combining the weights of different fine-tuned models to create a new model with enhanced capabilities. This technique has proven highly effective, as demonstrated by the dominance of merged models in performance benchmarks.
Merging Techniques
- SLURP (Spherical Linear Interpolation): Interpolates the weights of two models using spherical linear interpolation. Different interpolation factors can be applied to various layers, allowing for fine-grained control.
- Decomposed Redundancy Addition (DeRA): Reduces redundancy in model parameters through pruning and rescaling of weights. This technique allows merging multiple models simultaneously.
- Pass-Through: Concatenates layers from different LLMs, including the possibility of concatenating layers from the same model (self-merging).
- Mixture of Experts (MoE): Combines feed-forward network layers from different fine-tuned models, using a router to select the appropriate layer for each token and layer. This technique can be implemented without fine-tuning by initializing the router using embeddings calculated from positive prompts.
Advantages of Model Merging:
- No GPU requirement, making it highly efficient.
- Ability to leverage existing fine-tuned models from the open-source community.
- Proven effectiveness in producing high-quality models.
1 Bit LLM
BitNet b1.58 where every weight in a Transformer can be represented as a {-1, 0, 1} instead of a floating point number.
Resources
- Pretrained Transformers for Text Ranking: BERT and Beyond
- Principal Component Analysis
- Deep Learning AI Short Courses
- ChromaDB Tutorial on DataCamp
- Visualize Vector Embeddings in a RAG System
- Natural Language Processing Specialization on Coursera
- Distributed Representations of Sentences and Documents
- A Gentle Introduction to Doc2Vec
- Word2Vec Archive
- Mastering LLM Techniques: Inference and Optimization
- LLAMA3 Documentation
- Gensim Documentation and Examples
- TensorBoard Documentation
- Evaluation of RAG Systems
- Sentence Transformers on Hugging Face
- Local RAG with Ollama and Weaviate
- Video Lectures from ESWC 2016 on Machine Learning
- https://huyenchip.com/2023/04/11/llm-engineering.html
- https://github.com/rasbt/LLMs-from-scratch
- reasonable and good explanations of how stuff works. No hype and no vendor content
- AI by hand
- Neural Networks From Scratch
books