Processing math: 100%

Topic Models - Latent Dirichlet Allocation

Graphical Models

  • Nodes are random variables
  • Edges denote possible dependence
  • Observed variables are shaded
  • Plates denote replicated structure
  • Structure of the graph defines the pattern of conditional dependence between the ensemble of random variables
  • E.g., this graph corresponds to

p(y,x1,...,xN)=p(y)Nn=1p(xn|y)

Generative Model

  • Each document is a random mixture of corpus-wide topics
  • Each word is drawn from one of those topics
` `
` `

The Posterior Distribution

  • In reality, we only observe the documents
  • LDA is to infer the underlying topic structure
` `

Latent Dirichlet allocation

` `
(Kk=1p(βk|η))(Dd=1p(θd|α)(Nn=1p(zd,n|θd)p(wd,n|zd,n,β1:K)))

where, βk ~ Dir(η), θd ~ Dir(α), zd,n ~ Multinomial(θd) , wd,n ~ Multinomial(βzd,n)