To have a more intuitive understanding of VAE and CVAE works, we examine a minimal example of applying them. I find this simple yet convincing example: using VAE and CVAE to learn to generate hand-written numbers from MNIST dataset, implemented in Pytorch. Let’s delve into it.

When the condition (label) is countable as here in this exmaple, convert the condition to a one-hot formulation, and thus we can have the distribution p(z y), by simply concatenating original z vector with one-hot form of y. When the condition gets more complicated, as in models such as AgentFormer, we could use a neural network to represent p(z y) instead, i.e. use two neural networks to map the condition to a mean and a covariance, and sample the z with Gaussian distribution according to the output parameters.

Reference

  1. GitHub repository: VAE-CVAE-MNIST https://github.com/timbmg/VAE-CVAE-MNIST.git