Torchserve Canary Deployment

Torchserve Canary Deployment

Untitled

eks-config.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
 
iam:
  withOIDC: true
 
metadata:
  name: basic-cluster
  region: ap-south-1
  version: "1.27"
 
managedNodeGroups:
  - name: ng-dedicated-1
    instanceType: t3a.xlarge
    desiredCapacity: 4
    spot: true
    labels:
      role: spot
    ssh:
      allow: true # will use ~/.ssh/id_rsa.pub as the default ssh key
    iam:
      withAddonPolicies:
        autoScaler: true
        awsLoadBalancerController: true
        certManager: true
        externalDNS: true
        ebs: true

Create the Cluster

eksctl create cluster -f eks-config.yaml

Install KServe with KNative and ISTIO

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.11.0/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.11.0/serving-core.yaml
kubectl apply -l knative.dev/crd-install=true -f https://github.com/knative/net-istio/releases/download/knative-v1.11.0/istio.yaml
kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.11.0/istio.yaml
kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.11.0/net-istio.yaml
kubectl patch configmap/config-domain \
      --namespace knative-serving \
      --type merge \
      --patch '{"data":{"emlo.tsai":""}}'
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.11.0/serving-hpa.yaml
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml

Wait for cert manager pods to be ready

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.11.2/kserve.yaml

Wait for KServe Controller Manager to be ready

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.11.2/kserve-runtimes.yaml

Create EBS Controller

eksctl create iamserviceaccount \
    --name ebs-csi-controller-sa \
    --namespace kube-system \
    --cluster basic-cluster \
    --role-name AmazonEKS_EBS_CSI_DriverRole \
    --role-only \
    --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
    --approve \
		--region ap-south-1
eksctl create addon --name aws-ebs-csi-driver --cluster basic-cluster --service-account-role-arn arn:aws:iam::006547668672:role/AmazonEKS_EBS_CSI_DriverRole --region ap-south-1 --force

Create the Storage Controller

sc.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
k apply -f sc.yaml

`

Create S3 Service Account

s3.yaml

apiVersion: v1
kind: Secret
metadata:
  name: s3creds
  annotations:
     serving.kserve.io/s3-endpoint: s3.ap-south-1.amazonaws.com # replace with your s3 endpoint e.g minio-service.kubeflow:9000
     serving.kserve.io/s3-usehttps: "1" # by default 1, if testing with minio you can set to 0
     serving.kserve.io/s3-region: "ap-south-1"
     serving.kserve.io/s3-useanoncredential: "false" # omitting this is the same as false, if true will ignore provided credential and use anonymous credentials
type: Opaque
stringData: # use `stringData` for raw credential string or `data` for base64 encoded string
  AWS_ACCESS_KEY_ID: AxxxxQxxxxxxxxY2xxx
  AWS_SECRET_ACCESS_KEY: "C/dGcccuAxxxxxxxx25mxxxxxxx"
 
---
 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-read-only
secrets:
- name: s3creds

Kubernetes Deployment Strategies

Untitled

Canary Deployment using NGINX: https://kubernetes.github.io/ingress-nginx/examples/canary/

Canary Deployment using ISTIO: https://istio.io/v1.10/blog/2017/0.1-canary/

KServe Canary Deployment

  • Overview:
    • Canary rollouts in Kubernetes are supported by KServe for inference services.
    • Enables deploying a new version of an InferenceService to receive a percentage of traffic.
  • Configurable Canary Rollout Strategy:
    • KServe supports a customizable canary rollout strategy with multiple steps.
    • Rollout strategy includes provisions for rollback to the previous revision if a step fails.
  • Automatic Tracking:
    • KServe automatically tracks the last good revision rolled out with 100% traffic.
    • The canaryTrafficPercent field in the component's spec sets the traffic percentage for the new revision.
  • Traffic Splitting:
    • During canary rollout, traffic is split between the last good revision and the new revision based on canaryTrafficPercent.
    • The first revision deployed receives 100% traffic.
    • In subsequent steps, if 10% traffic is configured for the new revision, 90% goes to the LatestRolledoutRevision.
  • Handling Unhealthy Revisions:
    • If a revision is unhealthy or bad, traffic is not routed to it.
    • In case of a rollback, 100% traffic is directed to the previous healthy revision, the PreviousRolledoutRevision.
  • Rollout Steps:
    • Step 1: Deploy the first revision, receives 100% traffic.
    • Step 2: Deploy multiple revisions, route a configured percentage to the new revision.
    • Step 3: Promote the LatestReadyRevision to the LatestRolledoutRevision, receiving 100% traffic and completing the rollout.

Note: Canary deployments allow controlled testing of new versions before full deployment, minimizing risks and ensuring a smooth transition.

Untitled

vit-classifier.yaml

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "imagenet-vit"
spec:
  predictor:
    serviceAccountName: s3-read-only
    model:
      modelFormat:
        name: pytorch
      storageUri: s3://tsai-emlo/kserve-ig-2/imagenet-vit/
      resources:
        limits:
          cpu: 2600m
          memory: 4Gi

This is our usual KServe deployment, nothing fancy

❯ kg isvc
NAME           URL                                     READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION            AGE
imagenet-vit   http://imagenet-vit.default.emlo.tsai   True           100                              imagenet-vit-predictor-00001   3m12s

Now let’s deploy the cat classifier model as the new canary candidate

We’ll modify the cat-classifier to advertise it’s model name as vit-classifier

aws s3 cp --recursive s3://tsai-emlo/kserve-ig-2/cat-classifier/ cat-classifier/
inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_envvars_config=true
install_py_dep_per_model=true
load_models=all
max_response_size=655350000
model_store=/mnt/models/model-store
default_response_timeout=600
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"vit-classifier":{"1.0":{"defaultVersion":true,"marName":"cat-classifier.mar","minWorkers":1,"maxWorkers":1,"batchSize":1,"maxBatchDelay":100,"responseTimeout":600}}}}
aws s3 cp cat-classifier/config/config.properties s3://tsai-emlo/kserve-ig-2/cat-classifier/config/config.properties

https://base64.guru/converter/encode/image

Convert this to base64

http://images.cocodataset.org/val2017/000000039769.jpg

input.json

{
	"instances": [
		{
        "data": "BASE64 IMAGE"
    }
	]
}
import requests
import json
 
url = "http://abd5bc101d01a4478baa58570709a6f6-1419869784.ap-south-1.elb.amazonaws.com/v1/models/imagenet-vit:predict"
 
with open("input.json") as f:
	payload = json.load(f)
headers = {
  'Host': 'imagenet-vit.default.emlo.tsai',
  'Content-Type': 'application/json'
}
 
response = requests.request("POST", url, headers=headers, json=payload)
 
print(response.text)
{
    "predictions": [
        {
            "class": "Egyptian cat",
            "probability": 0.9374412894248962
        }
    ]
}

Prometheus & Grafana

Install Prometheus

https://github.com/kserve/kserve/blob/master/docs/samples/metrics-and-monitoring/README.md

Install Kustomize on your system

https://kubectl.docs.kubernetes.io/installation/kustomize/

git clone https://github.com/kserve/kserve
cd kserve
kustomize build docs/samples/metrics-and-monitoring/prometheus-operator | kubectl apply -f -
kubectl wait --for condition=established --timeout=120s crd/prometheuses.monitoring.coreos.com
kubectl wait --for condition=established --timeout=120s crd/servicemonitors.monitoring.coreos.com
kustomize build docs/samples/metrics-and-monitoring/prometheus | kubectl apply -f -

Test if Prometheus is working

kubectl port-forward service/prometheus-operated -n kfserving-monitoring 9090:9090

We need to patch KServe’s Logging to log all Prometheus Metrics

If an InferenceService uses Knative, then it has at least two containers in one pod, queue-proxy and kserve-container. A limitation of using Prometheus is that it supports scraping only one endpoint in the pod. When there are multiple containers in a pod that emit Prometheus metrics, this becomes an issue (see Prometheus for multiple port annotations issue #3756 for the full discussion on this topic). In an attempt to make an easy-to-use solution, the queue-proxy is extended to handle this use case.

https://github.com/kserve/kserve/blob/master/qpext/README.md

qpext_image_patch.yaml

data:
  queue-sidecar-image: kserve/qpext:latest
kubectl patch configmaps -n knative-serving config-deployment --patch-file qpext_image_patch.yaml

NOTE: You will need to delete your deployment and redeploy to use the new qpext image we just patched

Install Grafana

helm install grafana grafana/grafana

NOTE: would be a good idea to move it to another namespace

Fetch the admin user’s password from secrets

kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Port forward Grafana

kubectl port-forward svc/grafana 3000:80

Add a New Data Source in Grafana

Untitled

Add this as the server url

http://prometheus-operated.kfserving-monitoring.svc.cluster.local:9090

Add this Dashboard

https://grafana.com/grafana/dashboards/18032-knative-serving-revision-http-requests/

Untitled

Now add some load to the model

send.py

import requests
import json
 
url = "http://abd5bc101d01a4478baa58570709a6f6-1419869784.ap-south-1.elb.amazonaws.com/v1/models/imagenet-vit:predict"
 
with open("input.json") as f:
	payload = json.load(f)
headers = {
  'Host': 'imagenet-vit.default.emlo.tsai',
  'Content-Type': 'application/json'
}
 
response = requests.request("POST", url, headers=headers, json=payload)
 
print(response.headers)
print(response.status_code)
print(response.json())

You should start seeing some stats about the requests

Untitled

Untitled

Canary Deployment

Now we are ready for a Canary Deployment and Observability with Prometheus & Grafana

vit-classifier.yaml

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "imagenet-vit"
  annotations:
    serving.kserve.io/enable-metric-aggregation: "true"
    serving.kserve.io/enable-prometheus-scraping: "true"
spec:
  predictor:
    canaryTrafficPercent: 30
    serviceAccountName: s3-read-only
    model:
      modelFormat:
        name: pytorch
      # storageUri: s3://tsai-emlo/kserve-ig-2/imagenet-vit/
      storageUri: s3://tsai-emlo/kserve-ig-2/cat-classifier/
      resources:
        limits:
          cpu: 2600m
          memory: 4Gi
❯ kg isvc
NAME           URL                                     READY   PREV   LATEST   PREVROLLEDOUTREVISION          LATESTREADYREVISION            AGE
imagenet-vit   http://imagenet-vit.default.emlo.tsai   True    70     30       imagenet-vit-predictor-00001   imagenet-vit-predictor-00002   73m

Start sending requests

for i in {1..200}; do python send.py ; done

Untitled

Lot of 4xx Errors!

Untitled

{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '56', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:12:35 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '1'}
404
{'error': 'Model with name imagenet-vit does not exist.'}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:12:35 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '750'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:12:36 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '764'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:12:37 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '751'}
200

This is because the mar file for cat-classifier still advertises the model name to be cat-classifier

Time to Rollback!

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "imagenet-vit"
  annotations:
    serving.kserve.io/enable-metric-aggregation: "true"
    serving.kserve.io/enable-prometheus-scraping: "true"
spec:
  predictor:
    canaryTrafficPercent: 0
    serviceAccountName: s3-read-only
    model:
      modelFormat:
        name: pytorch
      # storageUri: s3://tsai-emlo/kserve-ig-2/imagenet-vit/
      storageUri: s3://tsai-emlo/kserve-ig-2/cat-classifier/
      resources:
        limits:
          cpu: 2600m
          memory: 4Gi
❯ kg isvc
NAME           URL                                     READY   PREV   LATEST   PREVROLLEDOUTREVISION          LATESTREADYREVISION            AGE
imagenet-vit   http://imagenet-vit.default.emlo.tsai   True    100    0        imagenet-vit-predictor-00001   imagenet-vit-predictor-00002   89m

Now the PREV model will go to 100% and the new model will go down to 0% traffic

All of our requests are now going through

{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:43 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '751'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:44 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '754'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:45 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '765'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:46 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '754'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:47 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '766'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:48 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '743'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:49 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '781'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:50 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '744'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:51 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '768'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:18:52 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '755'}
200

We can fix the .mar file by recreating cat-classifier .mar but with model name as imagenet-vit

After fixing

Some of the responses are Bengal Cat !

{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:40:51 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '745'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:40:52 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '744'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:40:53 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '761'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:40:55 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '848'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:40:55 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '755'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:40:57 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '792'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:40:58 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '837'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:40:59 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '816'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:41:00 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '743'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:41:01 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '772'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:41:02 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '782'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:41:03 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '860'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:41:04 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '746'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:41:05 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '778'}
200
{'predictions': [{'class': 'Egyptian cat', 'probability': 0.9374412894248962}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:41:06 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '899'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '75', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:41:07 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '742'}
200

No More Errors! 🪄

Untitled

We can promote our new model by just removing canaryTrafficPercent

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "imagenet-vit"
  annotations:
    serving.kserve.io/enable-metric-aggregation: "true"
    serving.kserve.io/enable-prometheus-scraping: "true"
spec:
  predictor:
    serviceAccountName: s3-read-only
    model:
      modelFormat:
        name: pytorch
      # storageUri: s3://tsai-emlo/kserve-ig-2/imagenet-vit/
      storageUri: s3://tsai-emlo/kserve-ig-2/cat-classifier/
      resources:
        limits:
          cpu: 2600m
          memory: 4Gi
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:27 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '779'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:28 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '795'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:29 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '767'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:30 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '778'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:32 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '789'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:33 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '798'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:34 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '781'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:35 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '767'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:36 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '922'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:37 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '775'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:38 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '774'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:39 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '787'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69', 'content-type': 'application/json', 'date': 'Thu, 23 Nov 2023 19:44:40 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '825'}
200
{'predictions': [{'class': 'Bengal', 'probability': 0.5737159252166748}]}
{'content-length': '69'

Look at the Request Volume by Revision, our canary model is now promoted and older version of model is getting no requests

Untitled

NOTES: Route Traffic using Tags: https://kserve.github.io/website/0.11/modelserving/v1beta1/rollout/canary-example/#route-traffic-using-a-tag