4

Pretty simple, I just looking for a simple means of extracting word frequencies from a given website, or section of a website.

I am also interested in calculating average distance between two given words throughout a website. The units of distance being in words.

I am asking this question because I quite frankly haven't been able to find much information leading to the intuition of performing such a task. I don't have any experience with web spidering or scraping of any kind.

Thanks (I asked this question earlier, but it wasn't well formed)

jab
  • 5,673
  • 9
  • 53
  • 84
  • Maybe you can get some ideas by searching 'python str_word_count'. (str_word_count is a PHP function which return number of words counts of string) – Ivan Chau May 15 '13 at 05:45

1 Answers1

1

You could try to use Scrapy. It is quite powerful tool for scrapping websites, but may require knowledge of regular expressions and XPath. Try to follow tutorial.

KostasT
  • 217
  • 1
  • 3