0

This code works on Linux but fails to match on Windows:

if ( preg_match ( "~<meta name='date' content='(.*)'>\n<meta name='time' content='(.*)'>\n<meta name='venue' content='(.*)'>\n~", file_get_contents($filename), $matches) ) 
...

I guess the line end coding is wrong. How should I modifiy the pattern to be end-coding independent?

PeeHaa
  • 71,436
  • 58
  • 190
  • 262
ChrisJJ
  • 2,191
  • 1
  • 25
  • 38
  • xml/html + regex = [BAD](http://stackoverflow.com/a/1732454/118068). Use DOM instead. it'll also free you from worrying about linebreak characters. – Marc B Jan 07 '12 at 03:54
  • Thanks, but DOM cannot give precisely same match behaviour and I cannot afford change. If 100% back-compatibility were not required, yes I would use DOM. – ChrisJJ Jan 07 '12 at 04:27
  • `//meta[@name='time']` isn't accurate enough? – Marc B Jan 07 '12 at 17:15

1 Answers1

1

Windows line endings are:

   "\r\n"

The simplest solution is:

if (preg_match ("~<meta name='date' content='(.*)'>\n<meta name='time' content='(.*)'>\n<meta name='venue' content='(.*)'>\n~", file_get_contents($filename), $matches)
  ||
  preg_match("~<meta name='date' content='(.*)'>\r\n<meta name='time' content='(.*)'>\r\n<meta name='venue' content='(.*)'>\r\n~", file_get_contents($filename), $matches)) 

The correct solution probably is:

if (preg_match("~<meta name='date' content='(.*)'>[\r]?\n<meta name='time' content='(.*)'>[\r]?\n<meta name='venue' content='(.*)'>[\r]?\n~", file_get_contents($filename), $matches))

That said, you probably really should use another method for dealing with HTML & XML. There are parsers built specifically for that.

e.g. http://docs.php.net/manual/en/domdocument.loadhtml.php or http://php.net/manual/en/book.xml.php

On a side note, I haven't really tested either but iirc, they work. Regex is not something I use much.

EDIT: Seems to work fine?

$file = "iorahgjajgasjgasjgasjgjaagaspokadsfgals<meta name='date'   content='(.*)'>\n<meta name='time' content='(.*)'>\n<meta name='venue' content='(.*)'>\niorahgjajgasjgasjgasjgjaagaspokadsfgals";

if (preg_match("~<meta name='date' content='(.*)'>\n<meta name='time' content='(.*)'>\n<meta name='venue' content='(.*)'>\n~", $file, $matches)
  || preg_match ("~<meta name='date' content='(.*)'>\r\n<meta name='time' content='(.*)'>\r\n<meta name='venue' content='(.*)'>\r\n~", file, $matches)) { 
  echo "Success";
}
else { 
  echo "Fail";
}

$file = "iorahgjajgasjgasjgasjgjaagaspokadsfgals<meta name='date' content='(.*)'>\r\n<meta name='time' content='(.*)'>\n<meta name='venue' content='(.*)'>\r\niorahgjajgasjgasjgasjgjaagaspokadsfgals";

if (preg_match ("~<meta name='date' content='(.*)'>[\r]?\n<meta name='time' content='(.*)'>[\r]?\n<meta name='venue' content='(.*)'>[\r]?\n~", $file, $matches)) {
  echo "Success";
}
else { 
  echo "Fail";
}    
apaderno
  • 28,547
  • 16
  • 75
  • 90
ReadWriteCode
  • 664
  • 4
  • 7