Shwetabh Biswas

Logo

Artificial Intelligence | Machine Learning | NLP | CV | Data Science | Life Long Learner

View the Project on GitHub shwetabh-23/VAE-website

Variational Autoencoders

Description

A Variational Autoencoder (VAE) is a type of generative model used in unsupervised learning, particularly in the domain of deep learning and neural networks. The primary purpose of a VAE is to model the underlying distribution of input data, enabling the generation of new samples that resemble the training data.

Key Concepts:

Encoder-Decoder Architecture: A VAE consists of an encoder and a decoder, forming an autoencoder structure. The encoder maps input data into a latent space, where each dimension represents a feature.

Latent Space: The latent space is a lower-dimensional representation where data is assumed to follow a certain distribution (often Gaussian). It allows for efficient sampling and interpolation between data points.

Variational Inference: VAEs use variational inference to estimate the posterior distribution of the latent space. This involves introducing a probabilistic element to the model, making it stochastic.

Reparameterization Trick: To make backpropagation feasible in a stochastic model, the reparameterization trick is employed. It involves sampling from a simple distribution (e.g., Gaussian) and transforming the sample into the desired distribution.

Workflow and Architecture

The project involves three architectures. The first two architectures are Variational Autoencoders. The first VAE comprises of an encoder (three sequential conv2d layers) a bottleneck layer (two FCs for mean and variance calculation and the reparametrization layer), and the decoder layer (three sequential convtranspose2d layer). The second architecture is the same as first, except that in the decoder layer we have three upsample + conv layers. This architecture is called VAE-Variation. And the third architecture is of a standard autoencoder.

Understanding Metrics

The major inferences that can be drawn from observations are :
1. PSNR and LPIPS are the parameters that help us in understanding the performance of the model by comparing the generated image with it's ground truth value. A higher PSNR and a lower LPIPS indicates a good image.
2. The bottleneck layer disribution gives an indication of how easy it is to distinguish between classes. In other words, if a classifier is trained, how good or bad it'll perform.
3. The reconstructed image is the model output.

Observations

Evaluation Metrics

RESULTS FOR VAE

MNIST DATA RESULTS

Size of bottleneck layer PSNR LPIPS Reconstructed Image Bottleneck Layer distribution
2 13.392175427879028 0.2470882730558514 Reconstructed Image 1 Reconstructed Image 2
10 15.822724972980408 0.1317140911705792 Reconstructed Image 1 Reconstructed Image 2

CIFAR10 DATA RESULTS

Size of bottleneck layer PSNR LPIPS Reconstructed Image Bottleneck Layer distribution
64 17.078560377700143 0.40190513152629137 Reconstructed Image 1 Reconstructed Image 2
128 18.22797081576608 0.32085249945521355 Reconstructed Image 1 Reconstructed Image 2
256 19.16980126602382 0.2600733716972172 Reconstructed Image 1 Reconstructed Image 2

RESULTS FOR VAE VARIATION

MNIST DATASET RESULTS

Size of bottleneck layer PSNR LPIPS Reconstructed Image Bottleneck Layer distribution
2 12.861776072285114 0.2801305861212313 Reconstructed Image 1 Reconstructed Image 2
10 18.255316526550935 0.15646151930559427 Reconstructed Image 1 Reconstructed Image 2

CIFAR10 DATA RESULTS

Size of bottleneck layer PSNR LPIPS Reconstructed Image Bottleneck Layer distribution
64 14.979190799345862 0.4430371061898768 Reconstructed Image 1 Reconstructed Image 2
128 16.356243858018694 0.3574161999858916 Reconstructed Image 1 Reconstructed Image 2
256 16.965382100831732 0.3165968810208142 Reconstructed Image 1 Reconstructed Image 2

Inference and Conclusions

The major conclusions that can be drawn from observations are :
1. Increasing the bottleneck layer size imporves performance.
2. Concluding anything on classification accuracy is not possible.
3. The image quality gets better with increasing bottleneck layer size.