0

I have spent over 2 hours trying to get this to work I want to extract the

values between ":"

and ","eng_data&

the txt is here http://fdguirhgeruih.x10.mx/html.txt

The output should be a list of over 300 IDs but I only get one

http://fdguirhgeruih.x10.mx/extract.php

when I run the script

 <? php

    //First, open the file. Change your filename
    $file = "http://fdguirhgeruih.x10.mx/html.txt";
    $word1='&quot;:&quot;';
    $word2='&quot;,&quot;eng_data&';


    $contents = file_get_contents($file);

    $between=substr($contents, strpos($contents, $word1), strpos($contents, $word2) - strpos($contents, $word1));

    echo $between; 


    ?>

2 Answers2

3

This looks like a standard XML file.
use simpleXML to parse it instead of regexp

Itay Moav -Malimovka
  • 52,579
  • 61
  • 190
  • 278
1

The content is HTML, not XML as first answer noted. Use the simple html dom parser.

davidethell
  • 11,708
  • 6
  • 43
  • 63
  • +1 but the native PHP DOM library would be a better option. Seen a lot of [negative reviews](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php/3577662#3577662) of Simple Html DOM Parser – Phil Oct 30 '11 at 23:50
  • Yes, native DOM could be better than the Simple HTML DOM Parser. I don't know how well it is maintained as I haven't needed it in a while. – davidethell Oct 30 '11 at 23:54
  • @itay not always true. XHTML is XML but if you look at his source document it has many invalid tags as far as XML is concerned. For example, the img tags have no closure as is required in valid XML. – davidethell Oct 31 '11 at 02:21