We'll start off by building a simple autoencoder to compress the MNIST dataset. With autoencoders, we pass input data through an encoder that makes a compressed representation of the input. Then, this representation is passed through a decoder to reconstruct the input data. Generally the encoder and decoder will be built with neural networks, then trained on example data.
In this notebook, we'll be build a simple network architecture for the encoder and decoder. Let's get started by importing our libraries and getting the dataset.
%matplotlib inline
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', validation_size=0)
print("mnist.train.images is of type {}".format(type(mnist.train.images)))
print("Its shape is {}".format(mnist.train.images.shape))
print("Normalized? max, min, mean is {} {} {}".format(mnist.train.images.max(),
mnist.train.images.min(), mnist.train.images.mean()))
Below I'm plotting an example image from the MNIST dataset. These are 28x28 grayscale images of handwritten digits.
img = mnist.train.images[2]
print("Image has its nonzero values between 0 and 1", img[img.nonzero()])
plt.imshow(img.reshape((28, 28)), cmap='Greys_r')
We'll train an autoencoder with these images by flattening them into 784 length vectors. The images from this dataset are already normalized such that the values are between 0 and 1. Let's start by building basically the simplest autoencoder with a single ReLU hidden layer. This layer will be used as the compressed representation. Then, the encoder is the input layer and the hidden layer. The decoder is the hidden layer and the output layer. Since the images are normalized between 0 and 1, we need to use a sigmoid activation on the output layer to get values matching the input.
Exercise: Build the graph for the autoencoder in the cell below. The input images will be flattened into 784 length vectors. The targets are the same as the inputs. And there should be one hidden layer with a ReLU activation and an output layer with a sigmoid activation. Feel free to use TensorFlow's higher level API,
tf.layers
. For instance, you would usetf.layers.dense(inputs, units, activation=tf.nn.relu)
to create a fully connected layer with a ReLU activation. The loss should be calculated with the cross-entropy loss, there is a convenient TensorFlow function for thistf.nn.sigmoid_cross_entropy_with_logits
(documentation). You should note thattf.nn.sigmoid_cross_entropy_with_logits
takes the logits, but to get the reconstructed images you'll need to pass the logits through the sigmoid function.
image_size = mnist.train.images.shape[1]
print("The image size is: ", image_size)
print("The type of the images is :", type(mnist.train.images))
# Size of the encoding layer (the hidden layer)
encoding_dim = 32 # feel free to change this value
print("We are encoding it as a size: ", encoding_dim)
# Input and target placeholders
inputs_ = tf.placeholder(tf.float32, [None, image_size], name='inputs')
print("This is our placeholder for the inputs: ", inputs_)
# Target has the same size as the input image, since it is its reconstruction
targets_ = tf.placeholder(tf.float32, [None, image_size], name='targets')
print("Targets are the same size as the original images: ", targets_)
# Output of hidden layer, single fully connected layer here with ReLU activation
# Arguments of tf.layers.dense are the inputs, the output dimension and the activation
encoded = tf.layers.dense(inputs_, encoding_dim, activation=tf.nn.relu)
print("This is the encoding layer: ", encoded)
# Output layer logits, fully connected layer with no activation
logits = tf.layers.dense(encoded, image_size, activation=None)
print("This is the output layer before sigmoid: ", logits)
# Sigmoid output from logits
decoded = tf.nn.sigmoid(logits, name='output')
# flat images of the same size as the original images
print("Decoded is the sigmoid applied to the logits: ", decoded)
# Sigmoid cross-entropy loss
# We are taking logits here, since the loss function already applies sigmoid to the logits
# So the loss is calculated with the decoded images:
# (from the documentation): let x = logits, z = labels. The logistic loss is
# z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=targets_)
# the loss is per unit:
# returns a Tensor of the same shape as logits with the componentwise logistic losses.
print("Loss is calculated with the logits and the targets since the loss function applies sigmoid: ", loss)
# Mean of the loss
cost = tf.reduce_mean(loss)
# sums up the losses
# example from the documentation:
x = tf.constant([[1., 1.], [2., 2.]])
tf.reduce_mean(x) # 1.5
tf.reduce_mean(x, 0) # [1.5, 1.5]
tf.reduce_mean(x, 1) # [1., 2.]
print("Cost is a sum over the losses: ", cost)
# Adam optimizer
opt = tf.train.AdamOptimizer(0.001).minimize(cost)
print('Optimizer: ', opt)
# Create the session
sess = tf.Session()
print("Session: ", sess)
Here I'll write a bit of code to train the network. I'm not too interested in validation here, so I'll just monitor the training loss.
Calling mnist.train.next_batch(batch_size)
will return a tuple of (images, labels)
. We're not concerned with the labels here, we just need the images. Otherwise this is pretty straightfoward training with TensorFlow. We initialize the variables with sess.run(tf.global_variables_initializer())
. Then, run the optimizer and get the loss with batch_cost, _ = sess.run([cost, opt], feed_dict=feed)
.
epochs = 20
batch_size = 200
# the session is run with an initilization
sess.run(tf.global_variables_initializer())
for e in range(epochs):
# now we are getting to the data
for ii in range(mnist.train.num_examples//batch_size):
# the data is loaded into a batch
batch = mnist.train.next_batch(batch_size)
# inputs and targets are loaded into the feed
feed = {inputs_: batch[0], targets_: batch[0]}
# the feed is put into the running session and cost is an output
batch_cost, _ = sess.run([cost, opt], feed_dict=feed)
print("Epoch: {}/{}...".format(e+1, epochs),
"Training loss: {:.4f}".format(batch_cost))
Below I've plotted some of the test images along with their reconstructions. For the most part these look pretty good except for some blurriness in some parts.
# 2 rows, 10 columns
fig, axes = plt.subplots(nrows=2, ncols=10, sharex=True, sharey=True, figsize=(20,4))
# input images
in_imgs = mnist.test.images[:10]
# you can tell the session which output you want
reconstructed, compressed = sess.run([decoded, encoded], feed_dict={inputs_: in_imgs})
# show the images
for images, row in zip([in_imgs, reconstructed], axes):
for img, ax in zip(images, row):
ax.imshow(img.reshape((28, 28)), cmap='Greys_r')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# suppress the axes
fig.tight_layout(pad=0.1)
sess.close()
We're dealing with images here, so we can (usually) get better performance using convolution layers. So, next we'll build a better autoencoder with convolutional layers.
In practice, autoencoders aren't actually better at compression compared to typical methods like JPEGs and MP3s. But, they are being used for noise reduction, which you'll also build.