Topic Models - Latent Dirichlet Allocation

Graphical Models

  • Nodes are random variables
  • Edges denote possible dependence
  • Observed variables are shaded
  • Plates denote replicated structure
  • Structure of the graph defines the pattern of conditional dependence between the ensemble of random variables
  • E.g., this graph corresponds to

$$p(y, x_{1}, ..., x_{N}) = p(y)\prod^{N}_{n=1}p(x_{n}|y)$$

Generative Model

  • Each document is a random mixture of corpus-wide topics
  • Each word is drawn from one of those topics
``` ```
``` ```

The Posterior Distribution

  • In reality, we only observe the documents
  • LDA is to infer the underlying topic structure
``` ```

Latent Dirichlet allocation

``` ```

where, ~ , ~ , ~ , ~