Hi Salahuddin, thank you!
Do you mean by N? N should be the total # of words (unique) in the corpus. The derivation is for illustration purpose only assuming we are calculating the perplexity of the whole corpus. Usually, we don’t assume all words have the same probability 1/N.