Large Language Model

Model Merging

Model merging is an efficient alternative to fine-tuning that leverages the work of the open-source community. It involves combining the weights of different fine-tuned models to create a new model with enhanced capabilities. This technique has proven highly effective, as demonstrated by the dominance of merged models in performance benchmarks.

Merging Techniques

SLURP (Spherical Linear Interpolation): Interpolates the weights of two models using spherical linear interpolation. Different interpolation factors can be applied to various layers, allowing for fine-grained control.
Decomposed Redundancy Addition (DeRA): Reduces redundancy in model parameters through pruning and rescaling of weights. This technique allows merging multiple models simultaneously.
Pass-Through: Concatenates layers from different LLMs, including the possibility of concatenating layers from the same model (self-merging).
Mixture of Experts (MoE): Combines feed-forward network layers from different fine-tuned models, using a router to select the appropriate layer for each token and layer. This technique can be implemented without fine-tuning by initializing the router using embeddings calculated from positive prompts.

Advantages of Model Merging:

No GPU requirement, making it highly efficient.
Ability to leverage existing fine-tuned models from the open-source community.
Proven effectiveness in producing high-quality models.

1 Bit LLM

BitNet b1.58 where every weight in a Transformer can be represented as a {-1, 0, 1} instead of a floating point number.

Resources

books

https://shepherd.com/best-books/machine-learning-and-deep-neural-networks

Large Language Model

Table of Contents

Model Merging

Merging Techniques

1 Bit LLM

Graph View

Table of Contents