1

Using PHP and RegExp I need to grab the value inside of <rdf:li xml:lang="x-default"> and </rdf:li>

So the string I need a value from will have this line in it...

<rdf:li xml:lang="x-default">Yuengling Americas Oldest Brewery Huffmans Pub &amp; Grub 60x30 5</rdf:li>

I need to get the Yuengling Americas Oldest Brewery Huffmans Pub &amp; Grub 60x30 5 into a PHP variable. I'm not good with Regex, could someone help me to get this value?

$str = '<rdf:li xml:lang="x-default">Yuengling Americas Oldest Brewery Huffmans Pub &amp; Grub 60x30 5</rdf:li>';

My string comes from reading the contents of an .AI file....

%PDF-1.5
%âãÏÓ
1 0 obj
<</Metadata 2 0 R/OCProperties<</D<</ON[7 0 R]/Order 8 0 R/RBGroups[]>>/OCGs[7 0 R]>>/Pages 3 0 R/Type/Catalog>>
endobj
2 0 obj
<</Length 67315/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">Yuengling Americas Oldest Brewery Huffmans Pub &amp; Grub 60x30 5</rdf:li>
            </rdf:Alt>
         </dc:title>
      </rdf:Description>
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:xmpGImg="http://ns.adobe.com/xap/1.0/g/img/">
         <xmp:MetadataDate>2014-04-01T16:13-05:00</xmp:MetadataDate>
         <xmp:ModifyDate>2014-04-01T16:13-05:00</xmp:ModifyDate>
         <xmp:CreateDate>2014-04-01T16:13-05:00</xmp:CreateDate>
         <xmp:CreatorTool>Adobe Illustrator CS6 (Windows)</xmp:CreatorTool>
         <xmp:Thumbnails>
            <rdf:Alt>
               <rdf:li rdf:parseType="Resource">....
Andy Lester
  • 91,102
  • 13
  • 100
  • 152
JasonDavis
  • 48,204
  • 100
  • 318
  • 537
  • 2
    PHP has a [number of tools](http://www.php.net/manual/en/refs.xml.php) to make working with XML easier. There's no need to resort to regular expressions. See http://stackoverflow.com/a/1732454/1715579 – p.s.w.g Apr 29 '14 at 18:11
  • 1
    Don't use a regex here. Use an XML parser. – gen_Eric Apr 29 '14 at 18:12
  • @RocketHazmat My String contains other non-xml formatted items, do you know if that would cause a problem? – JasonDavis Apr 29 '14 at 18:13
  • @jasondavis: Where is this string coming from? What does the full string look like? You might be able to just parse the other stuff as text. – gen_Eric Apr 29 '14 at 18:14
  • @RocketHazmat I am using cURL to get the contents of .AI files. I will add the starting output to my question above as it;s a lot of text – JasonDavis Apr 29 '14 at 18:16
  • 1
    So, you want to read the metadata on an Illustrator (basically a PDF) file? There might be a library for this. Regex still isn't the answer. – gen_Eric Apr 29 '14 at 18:18
  • Maybe this can help: http://www.pdfparser.org/ – gen_Eric Apr 29 '14 at 18:20
  • FYI, added explanation to my solution. – zx81 Apr 29 '14 at 20:11

1 Answers1

1

Jason, all reservations aside, since you asked for a regex solution, here's a simple regex that matches what you want:

<rdf:li xml:lang="x-default">\K[^<]+(?=</rdf:li>)

How to use it:

$str = '<rdf:li xml:lang="x-default">Yuengling Americas Oldest Brewery Huffmans Pub &amp; Grub 60x30 5</rdf:li>';

$regex = '~<rdf:li xml:lang="x-default">\K[^<]+(?=</rdf:li>)~';

if(preg_match($regex,$str,$m)) {
    $myvariable = $m[0];
    echo $myvariable."<br />";
}

The output:

Yuengling Americas Oldest Brewery Huffmans Pub & Grub 60x30 5

How does it work?

You'll note that we start by matching the entire left delimiter. The \K then tells the engine to prune that from the returned match. Next [^<]+ matches any character that is not a < to eat up the text you want. Just to make sure, after matching that, we lookahead with (?= to make sure the closing delimiter follows the matched string.

zx81
  • 41,100
  • 9
  • 89
  • 105