-1

I need to extract information from wikipedia, but I have no idea on how to proceed. What I have to do is the following:

Given a word 'w', how can I count the number of times 'w' appears in the whole English Wikipedia? Is there a list already available online? If not, how could I do such thing? I am new to coding and I'm trying to do some experiments in some NLP-related tasks.

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Alfred
  • 503
  • 1
  • 5
  • 20

1 Answers1

0

First download the wikipedia dump (in XML format for example)
If you are using a UNIX based OS (ex. LINUX or Mac OS X) you can use grep. see here

Python can also be used to count occurrences of a specified string in a file
see here

DBaker
  • 2,079
  • 9
  • 15