deeplearning.ai's Intro to TensorFlow (Week 2)

Written on March 23, 2019

[ deep-learning machine-learning tensorflow python coursera easi ]

This week’s content got a little more into actual machine learning models, namely simple multiperceptron-style networks – i.e., going from a linear regression to a network with hidden layers and non-identity activation functions. Instead of using MNIST as a starting point, the course creators buck that trend and dive into Fashion MNIST. Very briefly, the fact that implicit biases may be inherent in a data set is mentioned, and it is pointed out that such biases can unknowingly leak into machine learning models and cause downstream issues. However, most of this content was optional: one either explores the provided reference and follows down the rabbit hole via references therein, or not. The week’s quiz doesn’t even mention it. My 2 cents: take the detour, at least briefly.

Fashion MNIST

Fashion MNIST is cool, not just because it is another publicly available, popular benchmarking data set, but because the designer of the F-MNIST put in the effort to make it a 100% pain-free drop-in for any algorithms already tested on MNIST. The idea (I think) is that even the simplest machine learning models can perform well on MNIST without much effort, so it’s really not much of a benchmark. Fashion MNIST provides a little more of a challenge, and so a little more of an indication how a technique/architecture ranks.

Fashion MNIST References

https://github.com/zalandoresearch/fashion-mnist
https://www.tensorflow.org/tutorials/keras/basic_classification
https://hanxiao.github.io/2018/09/28/Fashion-MNIST-Year-In-Review/
https://github.com/hwalsuklee/tensorflow-generative-model-collections
https://www.youtube.com/watch?v=RJudqel8DVA

Identifying Biases in Machine Learning

This is a very interesting subfield of machine learning.

The lectures introduced this concept very quickly and little artificially, noting that by encoding labels as numbers, there is no language bias (true, but I thought we do that b/c TF needs numerical data :-p).

At any rate, the meat of this topic was provided in a link to a Google Developer page called Machine Learning Fairness, which has an interesting video and a list of references for further reading. This linked to a great Google Developer blog article called Text Embedding Models Contain Bias, which very simply details what bias in machine learning models is, and details why and when it matters.

For example, the authors reference several studies that highlight embarrassing-to-alarming features learned (or not) by ML models:

“face classification models may not perform as well for women of color”
- this bias seems more “embarrassing” than “alarming” in that the training data set should have been better vetted
- given that it can be upsetting/distressing, one might call it alarming too though
- at any rate, this example definitely shows how unconscious bias can leak in to a training set
“classifiers trained to detect rude, disrespectful, or unreasonable comments may be more likely to flag the sentence ‘I am gay’ than ‘I am straight’”
- this bias is more “alarming” than embarrassing because the model is unable to contextualize the word “gay” and therefore unable to distinguish between someone self-identifying as gay and someone else using the word “gay” as a slur
- it is alarming because it will flag someone self-identifying as gas as being rude, disrespectful, or unreasonable
“speech transcription may have higher error rates for African Americans than White Americans”
- this has both elements of “embarrassing” (should have used a more robust training set!) and possibly “alarming” too (can be upsetting/distressing)

Actually, after trying to decide where these examples fit on a scale from “embarrassing” to “alarming,” it strikes me this isn’t the right scale to use: they all seem to contain both embarrassing and alarming elements. Also, all examples are “bad for business”, period. For a company like Google, any of these would be bad press. For a small startup, missteps like these might be the difference between gaining traction or going extinct.

Some of the ML bias concerns are clearly concerning, however these isssues are very context dependent. Biases identified in a data set are not necessarily inherently bad and should not be automatically removed. Really depends on how you will be using the data.

For example, in the beginning of the article, looking at the ability of 5 different models in predicting whether or not a movie review is positive or negative, the authors observe that model C performs the best – but with a caveat:

“Normally, we’d simply choose Model C. But what if we found that while Model C performs the best overall, it’s also most likely to assign a more positive sentiment to the sentence “The main character is a man” than to the sentence “The main character is a woman”? Would we reconsider?”

If you want to reflect the reality presented in the data set and make the best prediction, you would not reconsider. In a scientific, empirical sense, this bias (preference?) simply exists – at least in the minds of the reviewers represented in the training set. If the context is that you will be betting money on whether or not a reviewer gave a newly released movie a thumbs up or thumbs down based only the on the text of their review, you better use the best model (especially if it’s a reviewer that had representation in your training set). However, even in this case, knowledge of implicit bias in the training set would be helpful, e.g., if you know all the reviewers in the training set were 20-something year old males and the bet was to be made on a 40-something year old female’s review, you might not use Model C or even take that bet.

The authors of the article actually go into this quite a bit, and more eloquently than I just did.

ML Implicit Bias References

Some Code

from tensorflow import keras 

# RAW DATA
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

# NORMALIZE DATA
training_images, test_images = training_images/255.0, test_images/255.0

# MODEL:  xMRNS = y  
model = keras.Sequential([
  keras.layers.Flatten(input_shape=(28,28)),       # 28x28 input to match F-MNIST images, then flattened to 1D
  keras.layers.Dense(128, activation=tf.nn.relu),  # hidden layer
  keras.layers.Dense(10, activation=tf.nn.softmax) # 10 neurons to match 10 clothing classes
])

# OPTIMIZER METHOD & LOSS FCN
model.compile(optimizer = tf.train.AdamOptimizer(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])

# TRAIN MODEL
model.fit(training_images, training_labels, epochs=5)

# EVALUATE MODEL
model.evaluate(test_images, test_labels)

# CLASSIFICATIONS 
#   - vectors representing probability of each FMNIST class for given image
classifications = model.predict(test_images)

Updating that code to be more flexible w/ a function:

def get_model(
  dense = [128],
  activation = tf.nn.relu,
  optimizer = tf.train.AdamOptimizer(),
  loss = 'sparse_categorical_crossentropy',
  metrics = ['accuracy'],
):

  # DENSE LIST
  #  -- for single hidden layer designations, like 512
  if type(dense) is not type(list()): dense = [dense]

  # MODEL:  xMRNS = y  
  model = keras.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
  for layer_size in dense: model.add(tf.keras.layers.Dense(layer_size, activation = activation))
  model.add(keras.layers.Dense(10, activation=tf.nn.softmax))

  # OPTIMIZER METHOD & LOSS FCN
  model.compile(
    optimizer = optimizer,
    loss = loss,
    metrics = metrics
  )
  
  return model
  
#---------------------------------------------------------
def train_model(
  model,
  training_images,
  training_labels,
  epochs = 5,
  loss_goal = 0.2,
  acc_goal = 0.9
):
  # Add Error Checks: training_images.shape == (28,28), etc
  
  # Threshold Callbacks
  #   -- we inherit from Keras' built-in Callback abstract class
  #      and override the .on_epoch_end method
  #   -- other methods could be used too (.on_batch_begin and .on_batch_end)
  class myCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
      if(logs.get('loss') < loss_goal):
        print(f"\n\nReached loss goal so cancelling training!\n\n")
        self.model.stop_training = True
      if(logs.get('acc') > acc_goal):
        print(f"\n\nReached accuracy goal so cancelling training!\n\n")
        self.model.stop_training = True

  model.fit(
    training_images, 
    training_labels, 
    epochs = epochs,
    callbacks = [myCallback()]
  )
  
  return model

#---------------------------------------------------------
import tensorflow as tf
from tensorflow import keras

# GET DATA
fmnist = keras.datasets.fashion_mnist

(x_train, y_train),(x_test, y_test) = fmnist.load_data()
x_train, x_test = x_train/255., x_test/255.

model = get_model(512)
model = train_model(model, x_train, y_train, epochs=10, loss_goal=0.2, acc_goal=0.95)