Transformers
This page provides examples that build model containers from models developed from the Transformers library.
Each example requires the transformers package be installed at a minimum. Use pip to install this in your Python environment:
Text Classification
This example was adapted from the bhadresh-savani/distilbert-base-uncased-emotion model from Hugging Face. View the original model card here.
Source Code
If you prefer to view the source code directly, reference this example and utility data files here.
First, install the additional dependencies required by this model.
Now, simply copy the below example code in your Python environment and editor of choice.
Example
The below code can be used to build a model container out of a transformers example model. Be sure to view the code annotations for more details.
import time
import json
import torch
import numpy as np
from typing import Mapping, Dict
from transformers import AutoTokenizer, AutoModelForSequenceClassification #(1)
from chassisml import ChassisModel #(2)
from chassis.builder import DockerBuilder #(3)
# load model with AutoTokenizer package #(4)
tokenizer = AutoTokenizer.from_pretrained("bhadresh-savani/distilbert-base-uncased-emotion")
distilbert_model = AutoModelForSequenceClassification.from_pretrained("bhadresh-savani/distilbert-base-uncased-emotion")
# define labels from model config #(5)
labels = distilbert_model.config.id2label
# define predict function #(6)
def predict(inputs: Mapping[str, bytes]) -> Dict[str, bytes]:
text = inputs['input.txt'].decode()
inputs = tokenizer(text, return_tensors="pt")
# run preprocessed data through model
with torch.no_grad():
logits = distilbert_model(**inputs).logits
softmax = torch.nn.functional.softmax(logits, dim=1).detach().cpu().numpy()
# postprocess
indices = np.argsort(softmax)[0][::-1]
results = {
"data": {
"result": {
"classPredictions": [{"class": labels[i], "score": round(softmax[0][i].item(), 4)} for i in indices]
}
}
}
return {'results.json': json.dumps(results).encode()}
# create chassis model object
chassis_model = ChassisModel(process_fn=predict) # (7)
# add metadata & requirements
chassis_model.add_requirements(["transformers", "torch", "numpy"]) # (8)
chassis_model.metadata.model_name = "DistilBERT Text Classification" # (9)
chassis_model.metadata.model_version = "0.0.1"
chassis_model.metadata.add_input(
key="input.txt",
accepted_media_types=["text/plain"],
max_size="10M",
description="Input text to be classified by model"
)
chassis_model.metadata.add_output(
key="results.json",
media_type="application/json",
max_size="1M",
description="Classification predictions of six emotions classes and the corresponding confidence score"
)
# test model # (10)
text = open("data/input.txt", 'rb').read()
results = chassis_model.test({"input.txt": text})
print(results)
# define builder object # (11)
builder = DockerBuilder(package=chassis_model)
# local docker mode #(12)
start_time = time.time()
res = builder.build_image(name="distilbert-text-classification", show_logs=True)
end_time = time.time()
print(res)
print(f"Container image built in {round((end_time-start_time)/60, 5)} minutes")
- Here, we simply import the objects we will need from the
transformerslibrary - Now, we will import the
ChassisModelclass from the Chassis SDK - In addition to the
ChassisModelobject, we need to import a Builder object. The two available options,DockerBuilderandRemoteBuilder, will both build the same container but in different execution environments. Since we'd like to build a container locally with Docker, we will import theDockerBuilderobject. - Here, we are loading our model in the exact way the model card on Hugging Face recommends
- With our loaded model, we can access the human-readable labels the model can predict. We will load them into a
labelsvariable and reference it in ourpredictfunction below - Here, we will define a single predict function, which you can think of as an inference function for your model. This function can access in-memory objects (e.g.,
tokenizer,distilbert_model, andlabelswhich are all in-memory objects loaded above), and the only requirement is it must convert input data from raw bytes form to the data type your model expects. See this guide for help on converting common data types. In this example, we process the raw bytes text data with a Python nativedecode()function, pass this processed data through to our model for predictions, and encode the output as base64-encoded JSON. - Now, we will simply create a
ChassisModelobject directly from our predict function. - With our
ChassisModelobject defined, there are a few optional methods we can call. Here, we will add the Python libraries our model will need to run. You can pass a list of packages you would list in arequirements.txtfile that will be installed with Pip. - In the next few lines, we will define the four minimum metadata fields that are required before building our container. These fields represent your model's name, version, inputs, and outputs. NOTE: There are many other optional fields you can choose to document if preferred.
- Before kicking off the Chassis job, we can test our
ChassisModelobject by passing through sample data. - After our test has passed, we can prepare our model build context. To do so, we will define our builder object, which as mentioned before, will be
DockerBuilder. This builder object uses your local Docker daemon to build a model container and store it on your machine. - With our builder object defined with our model predict function, we can kick off the build using the
DockerBuilder.build_imagefunction to build a Docker container locally.
Code Llama (LLM)
This example was adapted from the codellama/CodeLlama-7b-hf Code Llama model in the Hugging Face Transformers format. View the original model card here.
Source Code
If you prefer to view the source code directly, reference this example and utility data files here.
GPU Recommended
Large Language Models (LLMs) are very resource intensive and run better with accelerated hardware. For a smoother experience, it is highly recommended that you only follow these examples if you have access to a GPU.
Disk Storage Needed
This LLM results in a container that is roughly 20GB in size. Ensure your machine has enough disk space left to handle this container before continuing.
First, install the additional dependencies required by this model.
Now, simply copy the below example code in your Python environment and editor of choice.
Example
The below code can be used to build a model container from Meta's Code Llama base model (7B params). Be sure to view the code annotations for more details.
import json
import time
from typing import Mapping, Dict
import torch
import transformers
from transformers import AutoTokenizer #(1)
torch.cuda.empty_cache()
from chassisml import ChassisModel #(2)
from chassis.builder import DockerBuilder, BuildOptions #(3)
# load model and dependencies #(4)
model = "codellama/CodeLlama-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
# define predict function #(5)
def predict(inputs: Mapping[str, bytes]) -> Dict[str, bytes]:
prompt = inputs['input'].decode()
sequences = pipeline(
prompt,
do_sample=True,
top_k=10,
temperature=0.1,
top_p=0.50,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
max_length=500,
)
return {"generated_text": json.dumps(sequences[0]["generated_text"]).encode()}
# create chassis model object
chassis_model = ChassisModel(process_fn=predict) # (6)
# add metadata & requirements
chassis_model.add_requirements(["transformers", "torch", "accelerate"]) # (7)
chassis_model.metadata.model_name = "Code Llama" # (8)
chassis_model.metadata.model_version = "0.0.1"
chassis_model.metadata.add_input(
key="input",
accepted_media_types=["text/plain"],
max_size="10M",
description="Code prompt"
)
chassis_model.metadata.add_output(
key="generated_text",
media_type="application/json",
max_size="10M",
description="Generated code by Code Llama model"
)
# test model # (9)
input = "# Python hello world function".encode()
results = chassis_model.test({"input": input})
print(results)
# define builder object # (10)
build_options = BuildOptions(cuda_version="11.0.3")
builder = DockerBuilder(chassis_model, build_options)
# local docker mode #(11)
start_time = time.time()
res = builder.build_image(name="distilbert-text-classification", show_logs=True)
end_time = time.time()
print(res)
print(f"Container image built in {round((end_time-start_time)/60, 5)} minutes")
- Here, we simply import the objects we will need from the
transformerslibrary - Now, we will import the
ChassisModelclass from the Chassis SDK - In addition to the
ChassisModelobject, we need to import a Builder object. The two available options,DockerBuilderandRemoteBuilder, will both build the same container but in different execution environments. Since we'd like to build a container locally with Docker, we will import theDockerBuilderobject. Notice we also importBuildOptions. We will use this object to define the cuda version our models needs to run. - Here, we are loading our model in the exact way the model card on Hugging Face recommends. In this example, we will use the transformers pipeline functionality.
- Here, we will define a single predict function, which you can think of as an inference function for your model. This function can access in-memory objects (e.g.,
tokenizerandpipeline, which are both in-memory objects loaded above), and the only requirement is it must convert input data from raw bytes form to the data type your model expects. See this guide for help on converting common data types. In this example, we process the raw bytes text data with a Python nativedecode()function, pass this processed data through to our model for predictions, and encode the output as base64-encoded JSON. - Now, we will simply create a
ChassisModelobject directly from our predict function. - With our
ChassisModelobject defined, there are a few optional methods we can call. Here, we will add the Python libraries our model will need to run. You can pass a list of packages you would list in arequirements.txtfile that will be installed with Pip. - In the next few lines, we will define the four minimum metadata fields that are required before building our container. These fields represent your model's name, version, inputs, and outputs. NOTE: There are many other optional fields you can choose to document if preferred.
- Before kicking off the Chassis job, we can test our
ChassisModelobject by passing through sample data. - After our test has passed, we can prepare our model build context. To do so, we will define our builder object, which as mentioned before, will be
DockerBuilder. This builder object uses your local Docker daemon to build a model container and store it on your machine. We will also define aBuildOptionsvariable, which allows us to configure container-level dependencies, including cuda version, architecture(s) to build, and more. - With our builder object defined with our model predict function, we can kick off the build using the
DockerBuilder.build_imagefunction to build a Docker container locally.