Google released an open source large language model based on the technology used to create Gemini that is powerful yet lightweight, optimized to be used in environments with limited resources like on a laptop or cloud infrastructure.

Gemma can be used to create a chatbot, content generation tool and pretty much anything else that a language model can do. This is the tool that SEOs have been waiting for.

It is released in two versions, one with two billion parameters (2B) and another one with seven billion parameters (7B). The number of parameters indicates the model’s complexity and potential capability. Models with more parameters can achieve a better understanding of language and generate more sophisticated responses, but they also require more resources to train and run.

The purpose of releasing Gemma is to democratize access to state of the art Artificial Intelligence that is trained to be safe and responsible out of the box, with a toolkit to further optimize it for safety.

Gemma By DeepMind

The model is developed to be lightweight and efficient which makes it ideal for getting it into the hands of more end users.

Google’s official announcement noted the following key points:

  • “We’re releasing model weights in two sizes: Gemma 2B and Gemma 7B. Each size is released with pre-trained and instruction-tuned variants.
  • A new Responsible Generative AI Toolkit provides guidance and essential tools for creating safer AI applications with Gemma.
  • We’re providing toolchains for inference and supervised fine-tuning (SFT) across all major frameworks: JAX, PyTorch, and TensorFlow through native Keras 3.0.
  • Ready-to-use Colab and Kaggle notebooks, alongside integration with popular tools such as Hugging Face, MaxText, NVIDIA NeMo and TensorRT-LLM, make it easy to get started with Gemma.
  • Pre-trained and instruction-tuned Gemma models can run on your laptop, workstation, or Google Cloud with easy deployment on Vertex AI and Google Kubernetes Engine (GKE).
  • Optimization across multiple AI hardware platforms ensures industry-leading performance, including NVIDIA GPUs and Google Cloud TPUs.
  • Terms of use permit responsible commercial usage and distribution for all organizations, regardless of size.”

Analysis Of Gemma

According to an analysis by an Awni Hannun, a machine learning research scientist at Apple, Gemma is optimized to be highly efficient in a way that makes it suitable for use in low-resource environments.

Hannun observed that Gemma has a vocabulary of 250,000 (250k) tokens versus 32k for comparable models. The importance of that is that Gemma can recognize and process a wider variety of words, allowing it to handle tasks with complex language. His analysis suggests that this extensive vocabulary enhances the model’s versatility across different types of content. He also believes that it may help with math, code and other modalities.

It was also noted that the “embedding weights” are massive (750 million). The embedding weights are a reference to the parameters that help in mapping words to representations of their meanings and relationships.

An important feature he called out is that the embedding weights, which encode detailed information about word meanings and relationships, are used not just in processing input part but also in generating the model’s output. This sharing improves the efficiency of the model by allowing it to better leverage its understanding of language when producing text.

For end users, this means more accurate, relevant, and contextually appropriate responses (content) from the model, which improves its use in conetent generation as well as for chatbots and translations.

He tweeted:

“The vocab is massive compared to other open source models: 250K vs 32k for Mistral 7B

Maybe helps a lot with math / code / other modalities with a heavy tail of symbols.

Also the embedding weights are big (~750M params), so they get shared with the output head.”

In a follow-up tweet he also noted an optimization in training that translates into potentially more accurate and refined model responses, as it enables the model to learn and adapt more effectively during the training phase.

He tweeted:

“The RMS norm weight has a unit offset.

Instead of “x * weight” they do “x * (1 + weight)”.

I assume this is a training optimization. Usually the weight is initialized to 1 but likely they initialize close to 0. Similar to every other parameter.”

He followed up that there are more optimizations in data and training but that those two factors are what especially stood out.

Designed To Be Safe And Responsible

An important key feature is that it is designed from the ground up to be safe which makes it ideal for deploying for use. Training data was filtered to remove personal and sensitive information. Google also used reinforcement learning from human feedback (RLHF) to train the model for responsible behavior.

It was further debugged with manual re-teaming, automated testing and checked for capabilities for unwanted and dangerous activities.

Google also released a toolkit for helping end-users further improve safety:

“We’re also releasing a new Responsible Generative AI Toolkit together with Gemma to help developers and researchers prioritize building safe and responsible AI applications. The toolkit includes:

  • Safety classification: We provide a novel methodology for building robust safety classifiers with minimal examples.
  • Debugging: A model debugging tool helps you investigate Gemma’s behavior and address potential issues.
  • Guidance: You can access best practices for model builders based on Google’s experience in developing and deploying large language models.”

Read Google’s official announcement:

Gemma: Introducing new state-of-the-art open models

Featured Image by Shutterstock/Photo For Everything