Hi Shaswata, N should be the total # of unique words in the corpus. The derivation is for illustration purpose only assuming we are calculating the perplexity of the entire corpus. Usually, we don’t assume all words have the same probability 1/N.

I’m an Engineering Manager at Scale AI and this is my notepad for Applied Math / CS / Deep Learning topics. Follow me on Twitter for more!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store