I know that in NLP it is a challenge to determine the topic of a sentence or possibly a paragraph. However, I am trying to determine what the title may be for something like a Wikipedia article (of course without using other methods). My only though is finding the most frequent words. For the article on New York City these were the top results:
[('new', 429), ('city', 380), ('york', 361), ("'s", 177), ('manhattan', 90), ('world', 84), ('united', 78), ('states', 74), ('===', 70), ('island', 68), ('largest', 66), ('park', 64), ('also', 56), ('area', 52), ('american', 49)]
From this I can see some sort of statistical significance is the sharp drop from 361 to 177. Regardless, I am neither a statistics or NLP expert (in fact I'm a complete noob at both) so is this a viable way of determining the topic of a longer body of text. If so, what math am I looking for to calculate this? If not is there some other way in NLP to determine the topic or title for a larger body of text? For reference, I am using nltk and Python 3.