Model monitoring using Tensor flow Data Validation in TFX

Gopikrishna Yadam
5 min readMay 23, 2021

As we all are aware that ML models are increasing at the core of products or product features. As a result, data science teams are now responsible for ensuring their models perform as expected for the significant amount of time of the product.

A model’s performance needs to be monitored in real-time to detect any performance issues that will affect the end user's experience. An early identification mechanism has to be developed to identify the issues before they started affecting the bottom line or if users start complaining.

At the time of model development by the data science team they might be evaluated the model with accuracy performance metric, tracking the same accuracy metric is not possible in real-time because:

1. Model Intervention: The model you are using might impact the observations you can make because there is an action taken after a prediction.

E.g: Say if you are predicting customer churn, you would take an action to retain the customers that you have identified as likely to churn. When you measure the ground truth label a couple of months later and a customer’s hasn’t churned, you don’t know if that is because the prediction was wrong or because of the action you took.

2. Long Latency between prediction and ground truth label being available:

Imagine in credit card fraud, cases can be reported up to 90 days after the transaction was made. As a result, there is a 3-month delay between a prediction being made and ground truth being available. If accuracy is the only metric used to track the performance of a model, issues won’t be detected until 3 months after they started occurring.

In order to make our developed ML models make reliable predictions for a significant period of time, there is a need to monitor the data and possible changes that can affect the performance of your models.

Monitoring data

There are two main reasons why the performance of an AI application can decrease:

  1. The issue in the data gathering/processing pipeline:

This means that the data provided to the models is not as expected. This can occur because an update of an API you are using to collect data includes breaking changes.

E.g. Suppose there is a failure of a sensor that was gathering images and just starts sending dark images that are instead of pictures.

  1. Change in the nature/distribution of the data:

This means that the relationships between input and output data can change over time, meaning that in turn there are changes to the unknown underlying mapping function.

E.g. Suppose you have created a model that uses several features to predict whether a user is a spammer or not. At the end of the quarter, you have noticed that over time, the outcome of the predictions has drastically changed. In the best-case scenario, this change could have been because the spammers gave up. Or, and this is the worst-case, the change is because the concept of spammers has evolved. If we consider the worst-case scenario what if the spammer comes up with a different scenario that was completely different from the scenarios observed at the time of training. In other words, the concept of spammers has drifted.

From the above two scenarios, we can say that if the distribution of data changed between the training sets and test sets then it can be called Data Shift/ Data Drift in the world of Machine Learning i.e., Ptrain(y, x)≠Ptest(y, x). It has broadly classified into 3 types:

1. Covariate Shift

2. Prior Probability Shift

3. Concept Shift

Covariate Shift: Covariate shift is the change in the distribution of the covariates specifically, that is, the independent variables. Covariate shift appears only in X→Y problems, and is defined as the case where Ptrain(y|x) = Ptest(y|x) and Ptrain(x) ≠ Ptest(x). The below fig will clearly illustrate our learning function tries to fit the training data. But here, we can see that the distribution of training and test is different, so predicting using this learned function will definitely give us wrong predictions.

Examples where covariate shift is likely to cause problems:

· Face recognition algorithms that are trained predominantly on younger faces, yet the dataset has a much larger proportion of older faces in it.

· Classifying images as either cats or dogs and omitting certain species from the training set that are seen in the test set.

Covariate shift can cause a lot of problems when performing cross-validation. Cross-validation is almost unbiased without covariate shift but it is heavily biased under covariate shift!

Prior Probability Shift: Prior probability shift can be thought of as the exact opposite of covariate shift which refers to the changes in the distribution of the class variable Y while the distributions of X remain the same. Prior probability shift appears only in Y→X problems and is defined as the case where Ptrain(x|y) = Ptest(x|y) and Ptrain(y) ≠ Ptest(y).

An intuitive way to think about it might be to consider an unbalanced dataset.

If the training set has equal prior probabilities on the number of spam emails that you receive (i.e. the probability of an email being spam is 0.5), then we would expect 50% of the training set to contain spam emails and 50% to contain non-spam.

If, in reality, only 90% of our emails are spam (perhaps not unlikely), then our prior probability of the class variables has changed. This idea has relations to data sparsity and biased feature selection that are factors in causing covariance shift, but instead of influencing our input distribution, they instead influence our output distribution.

This problem only occurs in Y → X problems and is commonly associated with naive Bayes (hence the spam example, since naive Bayes is commonly used to filter spam emails).

Concept Shift: Concept drift is different from covariate and prior probability shift in that it is not related to the data distribution or the class distribution but instead is related to the relationship between the input and output variables change and is defined as Ptrain(Y|X) ≠ Ptest(Y|X). The example we have discussed in the Change in the nature/distribution of the data section comes under this concept.

Implementation of Data Shift using Tensor flow Data Validation

The above code and dataset can be downloaded here.

The above article is my understanding and implementation of Data Shift/Data Drift identification by using TFDV in TFX by exploring multiple resources in Coursera, other articles in medium, cloud.google & git resources, etc.

--

--