Skip to content

Deploy Model to KServe

Install Required Dependencies

  • Install Docker Desktop
    • Try to run docker ps
      • If you get a permissions error, follow instructions here
  • Install KServe:
    curl -s "" | bash

Define required variables

There are some environment variables that must be defined for KServe to work:

  • INTERFACE: kserve
  • HTTP_PORT: port where kserve will be running
  • PROTOCOL: it can be v1 or v2
  • MODEL_NAME: a name for the model must be defined

Deploy the model

For this tutorial, we will use the Chassis-generated container image uploaded as bmunday131/sklearn-digits. To deploy to KServe, we will use the file that defines the InferenceService for the protocol v1 of KServe.

apiVersion: ""
kind: "InferenceService"
  name: chassisml-sklearn-demo
    - image: bmunday131/sklearn-digits:0.0.1
      name: chassisml-sklearn-demo-container
      imagePullPolicy: IfNotPresent
        - name: INTERFACE
          value: kserve
        - name: HTTP_PORT
          value: "8080"
        - name: PROTOCOL
          value: v1
        - name: MODEL_NAME
          value: digits
        - containerPort: 8080
          protocol: TCP
In this case, the variable MODEL_NAME should not be necessary since it is defined when creating the image.

kubectl apply -f custom_v1.yaml

This should output a success message.

Deploy from Private Docker Registry

In the above example, we deploy a public container image, which means we do not need to define credentials to pull the image. If, however, you set up Chassis to push container images to a private registry, you will need to add a few lines to your yaml file.

First, create a Kubernetes imagePullSecrets object that contains your credentials as a list of secrets.

kubectl create secret docker-registry <registry-credential-secrets> \
  --docker-server=<private-registry-url> \
  --docker-email=<private-registry-email> \
  --docker-username=<private-registry-user> \

Visit Managing Secrets using kubectl for more details.

Next, add the following lines to your yaml file:

apiVersion: ""
kind: "InferenceService"
  name: chassisml-sklearn-demo
    - name: <registry-credential-secrets>
    - image: bmunday131/sklearn-digits:0.0.1
      name: chassisml-sklearn-demo-container
      imagePullPolicy: IfNotPresent
        - name: INTERFACE
          value: kserve
        - name: HTTP_PORT
          value: "8080"
        - name: PROTOCOL
          value: v1
        - name: MODEL_NAME
          value: digits
        - containerPort: 8080
          protocol: TCP

Finally, apply your changes:

kubectl apply -f custom_v1.yaml

Define required variables to query the pod

This is needed in order to be able to communicate with the deployed image.

The SERVICE_NAME must match the name defined in the of the InferenceService created above.

The MODEL_NAME must match the name of your model. It can be defined by the data scientist when making the request against Chassis service or overwritten in the InferenceService as defined above.


minikube tunnel

# in another terminal:
export INGRESS_HOST=localhost
export INGRESS_PORT=80


export INGRESS_HOST=$(minikube ip)
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?("http2")].nodePort}')

Mac or Linux:

export SERVICE_NAME=chassisml-sklearn-demo
export MODEL_NAME=digits
export SERVICE_HOSTNAME=$(kubectl get inferenceservice ${SERVICE_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)

Query the model

Please note that you must base64 encode each input instance. For example:

import json
import base64 as b64
instances = [[1,2,3,4],[5,6,7,8]]
input_dict = {'instances': [b64.b64encode(str(entry).encode()).decode() for entry in instances]}

Now you can just make a request to predict some data. Take into account that you must download inputsv1.json before making the request.

curl -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict" -d@inputsv1.json | jq

The output should be similar to this:

  "predictions": [
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
              "class": "4",
              "score": "1"
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
              "class": "8",
              "score": "1"
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
              "class": "8",
              "score": "1"
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
              "class": "4",
              "score": "1"
      "data": {
        "drift": null,
        "explanation": null,
        "result": {
          "classPredictions": [
              "class": "8",
              "score": "1"

In this case, the data was prepared for the protocol v1, but we can deploy the image using the protocol v2 and make the request using the data for v2.

Deploy the model locally

The model can also be deployed locally:

docker run --rm -p 8080:8080 \
-e INTERFACE=kserve \
-e HTTP_PORT=8080 \
-e PROTOCOL=v2 \
-e MODEL_NAME=digits \

So we can query it this way. Take into account that you must download inputsv2.json before making the request:

curl localhost:8080/v2/models/digits/infer -d@inputsv2.json

Tutorial in Action

Follow along as we walk through this tutorial step by step!