Separate the words from the html table and save them in txt file

Question

I have a problem parse words from HTML table. I need to separate the words from other content ("lemma" column):

The original version of the page in Russian - http://hsu.su/st2

English (googletranslate) - http://hsu.su/155

I have heard of PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/ but I can not figure out how to solve this problem with him.

see http://stackoverflow.com/questions/3577641/best-methods-to-parse-html — Gordon, Jan 07 '12 at 14:58

score 1 · Accepted Answer · answered Jan 07 '12 at 16:44

<?php
    include_once('simplehtmldom/simple_html_dom.php');
    $html = file_get_html('http://dict.ruslang.ru/freq.php?act=show&dic=freq_news_comp&title=%D1%EB%EE%E2%E0%F0%FC%20%E7%ED%E0%F7%E8%EC%EE%E9%20%E3%E0%E7%E5%F2%ED%EE-%ED%EE%E2%EE%F1%F2%ED%EE%E9%20%EB%E5%EA%F1%E8%EA%E8');

    $myFile = "file.txt";
    $fh = fopen($myFile, 'w') or die("can't open file");


    $table=$html->find('table',1);
    foreach($table->find('td') as $td)
    fwrite($fh, $td->plaintext);

    fclose($fh);
    ?>

Download simplehtmldom from the same link you provided..

copy it in the same folder

make sure the path inluded in the code refers to right class

make file.txt file in same folder..

and run the code...

You have

 '&nbsp;'

extra which you can remove from php string functions..

**Rajat SinghalI** sincerely thank you for your invaluable help! — user1103744, Jan 08 '12 at 11:49

score -1 · Answer 2 · answered Jan 07 '12 at 15:05

-1

Check out the PHP function strip_tags().

answered Jan 07 '12 at 15:05

Jeremy Harris

24,318
13
79
133

`strip_tags` will remove the tags. This would leave the OP still with the problem of how to get the data from the - now unstructured - text. – Gordon Jan 07 '12 at 15:31

Separate the words from the html table and save them in txt file

2 Answers2