2

I have some PHP code that accepts an uploaded file from an HTML form then reads through it using regex to look for specific lines (in the case below, those with "Track Number" followed by an integer).

The file is an XML file that looks like this normally...

<key>Disc Number</key><integer>2</integer>
<key>Disc Count</key><integer>2</integer>
<key>Track Number</key><integer>1</integer>

But when PHP reads it in it gets rid of the XML tags for some reason, leaving me with just...

Disc Number2
Disc Count2
Track Number1

The file has to be XML, and I don't want to use SimpleXML cause that's a whole other headache. The regex matches the integers like I want it to (I can print them out "0","1","2"...) but of course they're returned as strings in $matches, and it seems I'm unable to make use of these strings. I need to check if the integer is between 0 and 9 but I um unable to do this no matter what I try.

Using intval() or (int) to first convert the matches to integers always returns 0 even though the given string contains only integers. And using in_array to compare the integer to an array of 0-9 as strings always returns false as well for some reason. Here's the trouble code...

$myFile = file($myFileTmp, FILE_IGNORE_NEW_LINES);
$numLines = count($myFile) - 1;
$matches = array(); 
$nums = array('0','1','2','3','4','5','6','7','8','9');
for ($i=0; $i < $numLines; $i++) {
    $line = trim($myFile[$i]);
    $numberMatch = preg_match('/Track Number(.*)/', $line, $matches); // if I try matching integers specifically it doesn't return a match at all, only if I do it like this - it gives me the track number I want but I can't do anything with it
    if ($numberMatch == 1 and ctype_space($matches[1]) == False) {
       $number = trim($matches[1]); // string containing an integer only
       echo(intval($number)); // conversion doesn't work - returns 0 regardless
       if (in_array($number,$nums)===True) { // searching in array doesn't work - returns FALSE regardless
          $number = "0" . $number;
       }
    }
}

I've tried type checking, double quotes, single quotes, trimming whitespace, UTF8 encoding, === operator, regex matching numbers specifically with (\d+) (which doesn't return a match at all)...what else could it possibly be? When I try these things with regular strings it works fine, but the regex is messing everything up here. I'm about to give up on this app entirely, please save me.

Ugh
  • 31
  • 2
  • 2
    You really should be using an XML parser for this. See http://stackoverflow.com/q/8577060/800608 – mzedeler Dec 26 '15 at 21:28
  • Have you tried a simple cast : `$num = "56"; $int = (int)$num;` ? – Mxsky Dec 26 '15 at 21:46
  • Three things: matching an integer can be done with `\d+`, so your regex would come down to `Track Number(\d+)`. Second: why should your xml be stripped somewhere? What does really happen? Why is SimpleXML not an option? And third, what is in the `$matches` array? Can you edit your post and give the output of `print_r($matches);` ? – Jan Dec 26 '15 at 21:47
  • "But when PHP reads it in it gets rid of the XML tags for some reason" This sounds like an XML (DOM) parser was used but the document content was retrieved using the Text property instead of the OuterXml property. – Martin Maat Dec 26 '15 at 21:52

2 Answers2

1

Why is SimpleXML not an option? Consider the following code:

$str = "<container><key>Disc Number</key><integer>2</integer>
<key>Disc Count</key><integer>2</integer>
<key>Track Number</key><integer>1</integer></container>";
$xml = simplexml_load_string($str);

foreach ($xml->key as $k) {
    // do sth. here with it
}
Jan
  • 42,290
  • 8
  • 54
  • 79
0

You should read RegEx match open tags except XHTML self-contained tags -- while doesn't exactly match your use case it has good reasons why one should use something besides straight up regexp matching for your use case.

Assuming that files only contain a single Track Number you can simplify what you're doing a lot. See the following:

test.xml

<key>Disc Number</key><integer>2</integer>
<key>Disc Count</key><integer>2</integer>
<key>Track Number</key><integer>1</integer>

test.php

<?php

$contents = file_get_contents('test.xml');
$result = preg_match_all("/<key>Track Number<\/key><integer>(\d)<\/integer>/", $contents, $matches);
if ($result > 0) {
    print_r($matches);
    $trackNumber = (int) $matches[1][0];
    print gettype($trackNumber) . " - " . $trackNumber;
}

Result

$ php -f test.php
Array
(
    [0] => Array
        (
            [0] => <key>Track Number</key><integer>1</integer>
        )

    [1] => Array
        (
            [0] => 1
        )

)
integer - 1% 

As you can see, there is no need to iterate through the files line by line when using preg_match_all. The matching here is very specific so you don't have to do extra checks for whitespace or validate that it's a number. Which you're doing against a string value currently.

Community
  • 1
  • 1
cynicaljoy
  • 2,047
  • 1
  • 18
  • 25