0

I'm having problems with this code, and the PHP method 'substr' is playing up. I just don't get it. Here's a quick introduction what I'm trying to achieve. I have this massive XML-document with email-subscribers from Joomla. I'm trying to import it to Mailchimp, but Mailchimp have some rules for the syntax of the ways to import emails to a list. So at the moment the syntax is like this:

<subscriber>
    <subscriber_id>615</subscriber_id>
    <name><![CDATA[NAME OF SUBSCRIBER]]></name>
    <email>THE_EMAIL@SOMETHING.COM</email>
    <confirmed>1</confirmed>
    <subscribe_date>THE DATE</subscribe_date>
</subscriber>

I want to make a simple PHP-script that takes all those emails and outputs them like this:

  [THE_EMAIL@SOMETHING.COM] [NAME OF SUBSCRIBER]
  [THE_EMAIL@SOMETHING.COM] [NAME OF SUBSCRIBER]
  [THE_EMAIL@SOMETHING.COM] [NAME OF SUBSCRIBER]
  [THE_EMAIL@SOMETHING.COM] [NAME OF SUBSCRIBER]

If I can do that, then I can just copy paste it into Mailchimp.

Now here's my PHP-script, so far:

$fileName = file_get_contents('emails.txt');

foreach(preg_split("/((\r?\n)|(\r\n?))/", $fileName) as $line){

  if(strpos($line, '<name><![CDATA[')){
      $name = strpos($line, '<name><![CDATA[');
      $nameEnd = strpos($line, ']]></name>', $name);
      $nameLength = $nameEnd-$name;
      echo "<br />";
      echo " " . strlen(substr($line, $name, $nameLength));
      echo " " . gettype(substr($line, $name, $nameLength));
      echo " " . substr($line, $name, $nameLength);


  }
  if(strpos($line, '<email>')){
    $var1 = strpos($line, '<email>');
    $var2 = strpos($line, '</email>', $var1);
    $length = $var2-$var1;
    echo substr($line, $var1, $length);

  }
} 

The first if-statement works as it should. It identifies, if there's an ''-tag on the line, and if there is, then it finds the end-tag and outputs the email with the substr-method.

The second if-statement is annoying me. If should do the same thing as the first if-statement, but it doesn't. The length is the correct length (I've checked). The type is the correct type (I've checked). But when I try to echo it, then nothing happens. The script still runs, but it doesn't write anything.

I've played around with it quite a lot and seem to have tried everything - but I can't figure it out.

j0k
  • 22,600
  • 28
  • 79
  • 90
Zeth
  • 2,273
  • 4
  • 43
  • 91
  • 9
    Is there a specific reason you are using this cumbersome string fiddling approach instead of http://php.net/simplexml? – mario Oct 22 '12 at 19:50
  • Welcome to Stack Overflow! Please refrain from parsing HTML with RegEx as it will [drive you į̷̷͚̤̤̖̱̦͍͗̒̈̅̄̎n̨͖͓̹͍͎͔͈̝̲͐ͪ͛̃̄͛ṣ̷̵̞̦ͤ̅̉̋ͪ͑͛ͥ͜a̷̘͖̮͔͎͛̇̏̒͆̆͘n͇͔̤̼͙̩͖̭ͤ͋̉͌͟eͥ͒͆ͧͨ̽͞҉̹͍̳̻͢](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) instead. – Madara's Ghost Oct 22 '12 at 20:01

2 Answers2

3

Warning
This function may return Boolean FALSE, but may also return a non-Boolean value which evaluates to FALSE. Please read the section on Booleans for more information. Use the === operator for testing the return value of this function.

You should be using if(strpos($line,'...') !== false) {

That aside, your file seems to be XML, so you should use an XML parser lest you fall under the pony he comes.

DOMDocument is a good one. You could do something like this:

$dom = new DOMDocument();
$dom->load("emails.txt");
$subs = $dom->getElementsByTagName('subscriber');
$count = $subs->length;
for( $i=0; $i<$l; $i++) {
    $sub = $subs->item($i);
    echo $sub->getElementsByTagName('email')->item(0)->nodeValue;
    echo " ";
    echo $sub->getElementsByTagName('name')->item(0)->nodeValue;
    echo "\n";
}

This will output the names and emails in the format you described.

Community
  • 1
  • 1
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • I think simplexml might be a better option, DOMDocument is good for working with HTML and the DOM, but basic XML-parsing might be better in SimpleXML. At least it's less code as per my example below. – FilmJ Oct 22 '12 at 20:21
1

So there's a few things wrong with this, including the strpos command which will actually return 0 if it finds the tag at the beginning of the line, which doesn't appear to be what you intend.

Also, if the XML is not formatted exactly as you have, with each opening and closing tag on the one line, then your logic will fail as well.

It's not a good idea to re-invent XML processing for this reason...

Here as others have proposed, is a better solution to the problem*.

$xml = simplexml_load_file('emails.txt');
foreach( $xml->subscriber as $sub ) 
{
  // Note that SimpleXML is aware of CDATA, and only outputs the text
  $output = '[' . $sub->name . ']' . ' ' . '[' . $sub->email . ']'; 
}

*This assumes that you XML is valid, i.e. "subscriber" blocks are contained in a single parent at the top level. You can of course use simplexml documentation to adjust for your use case.

FilmJ
  • 2,011
  • 3
  • 19
  • 27