0

I have made a regex in php to find the tag and the attributes in an html line. It works but only works on the first attribute instead of repeating. The follow code gets me the first attribute and value.

'@<barcode(\s([a-z]+)="([^"]+)").*/>@m'

So then I added the plus to make it repeat, but it won't work.

'@<barcode(\s([a-z]+)="([^"]+)")+.*/>@m'

What happens after adding the plus is that it only selects the last attribute and value.

I just need all the attributes and values in an array, so I am wondering what I am doing wrong. Here is the possible html that I am searching through. Sometimes attribute are not always needed so I have to take that into account.

<barcode type="C128B" height="10" fontsize="0.4" code="testcode" align="L"/>
<barcode type="Hello"/>
<barcode type="Hello" code="balls"/>
<barcode type="C128B" height="10" fontsize="0.7" code="test" align="L"/>

I have an example on regex101 to see the problem https://regex101.com/r/jMdA6S/1

Our current application works, but only by repeating the following lines

'@<barcode ([a-z]+)="(.*)" ([a-z]+)="(.*)" ([a-z]+)="(.*)" ([a-z]+)="(.*)" ([a-z]+)="(.*)".*/>@m'

Which means everytime I add a new attribute I have to add another block of code in the regex. I am trying to avoid this as we sometimes have to add a new attribute to add different features.

Thomas Williams
  • 1,528
  • 1
  • 18
  • 37
  • @Amessihel, not sure what you mean 0:-) Although being a bit pedantic - they are actually asking for it in HTML and not XML - although https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 does spring to mind. – Nigel Ren Sep 22 '19 at 16:20
  • Yes this is not XML it is html – Thomas Williams Sep 22 '19 at 20:29

4 Answers4

1

You need to put /g at the end of your regex like so:

<barcode(\s([a-z]+)="([^"]+)").*/g>
user3783243
  • 5,368
  • 5
  • 22
  • 41
AnirudL
  • 23
  • 4
  • 1
    The `g` modifier doesn't exist in PHP. https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php – user3783243 Sep 22 '19 at 16:31
  • Actually I use @m, but when I pasted it on stack overflow I forgot to paste the full line as it came straight from the regex101 page I was working on. You can see this in the link I attached – Thomas Williams Sep 22 '19 at 20:34
1

A good practice is to parse HTML content with a relevant manipulation tool. For your question, you can parse while reading the file (SAX approach), or load the file at one time and then accessing its content (DOM approach).

Here is a way to perform what you need. I like using the SAX way if I don't need to keep the whole content (widely based on XML Element Structure Example of PHP official website):

<?php
$file = "data.html"; // your file
$depth = array();

function startElement($parser, $tagname, $attrs)
{
    // For each tag encountered
    //   - $tagname contains the name
    //   - $attrs is an associative array name -> value of the attributes

    // Add the code below the code to deal with it:
    echo "<pre>\n";
    echo "Tags : $tagname\n";
    echo "Attributes:\n";
    print_r($attrs);
    echo "</pre>\n";
}

// Create the parser
$xml_parser = xml_parser_create();

// Set element handles for the parser (we just need start element handler, 
// so the end element is set as FALSE
xml_set_element_handler($xml_parser, "startElement", FALSE);

// Open your file
if (!($fp = fopen($file, "r"))) {
    die("Oops.");
}

// Loop reading and parsing the file
while ($data = fread($fp, 4096)) {
    if (!xml_parse($xml_parser, $data, feof($fp))) {
        die("Oops.");
    }
}

// Done. Free your parser.
xml_parser_free($xml_parser);
?>
Amessihel
  • 5,891
  • 3
  • 16
  • 40
  • I am not parsing XML I am doing html. This is not for online html it is for stuff that isn't online so I don't really need a full blown parser. The line we currently use is the one at the end of my post, but I wanted to know why it wasn't repeating the pattern when I add the + – Thomas Williams Sep 22 '19 at 20:32
1

Well even though there were some good answers nobody was able to tell me if there was a way to do this in one regex which is what my question was. However I have had to succumb and do it in two regex's.I was trying to avoid 2 regexs, as I thought the plus was supposed to repeat the middle part.

The first regex finds the tags, and I have a getAttributes function which gets the attributes. The getAttributes function then puts each into a flat array for me to process. I am giving an answer, but even this answer does not really answer my question on how to do this in one regex. However I will post what I got working in case it helps anyone else.

Both Amessihel and Maciej Król gave good advice, and I would probably take that advice if this was a new project being built. However I have gone with the following code.

<?php
$str = '<barcode type="C128B" height="10" fontsize="0.4" code="pdfbarcode_content" align="L"/>
<barcode href="Hello"/>
<barcode href="Hello" type="balls"/>
<barcode type="C128B" height="10" fontsize="0.4"/>
<barcode type="C128B" height="10" fontsize="0.4" code="test" align="L"/>';

function getAttributes($attr){  
    preg_match_all('@(?:([a-z]+)="([^"]+)")+@m', $attr, $matches,PREG_SET_ORDER);
    $rArray=[];
    foreach($matches as $line):
        array_push($rArray,$line[1]);
        array_push($rArray,$line[2]);
    endforeach;
    return $rArray;
}
function barcode($file){
    return preg_replace_callback(
        '@<barcode(.*)/>@m',
        function($matches) {
            echo '<pre>'.print_r($matches[1],1).'</pre>';
            echo '<pre>'.print_r(getAttributes($matches[1]),1).'</pre>';
            echo "-----------------------";
            //Here is where I process the array
            return '';
    },
    $file);
}
barcode($str);
Thomas Williams
  • 1,528
  • 1
  • 18
  • 37
0

You probably need to write parser for this if u want to match unlimited amount of xml objects and access key values pairs from them (using regex).

I've prepared for you working example.

   $offset = 0;

   $lines = '
       <barcode type="C128B" height="10" fontsize="0.4" code="testcode" align="L"/>
       <barcode type="Hello"/>
       <barcode type="Hello" code="balls"/>
       <barcode type="C128B" height="10" fontsize="0.7" code="test" align="L"/>
   ';

   while (preg_match('/<(\S*)[\s]*(.*)[\s]*\/>/', $lines, $line_matches, PREG_OFFSET_CAPTURE, $offset))
   {
       // Set offset to the next line
       $offset = $line_matches[0][1] + strlen($line_matches[0][0]);

       // Get the line name
       $name = $line_matches[1][0];

       // Get the line content
       $line_content = $line_matches[2][0];

       if(preg_match_all('/([a-z]+)="([^"]+)"/', $line_content, $key_values_matches))
       {
           // Access all matched keys
           $keys = $key_values_matches[1];

           // Access all matches values
           $values = $key_values_matches[2];

           foreach ($keys as $index => $key) {
               // Access matched value for key
               $value = $values[$index];

               // Do something with your match
               echo "Found match in \"{$name}\" for key \"{$key}\" with value \"{$value}\"\n";
           }
       }

   };
Maciej Król
  • 382
  • 1
  • 8
  • Thanks for posting your code, but the regex should be sufficient for what we are doing. What I wanted to know was, why isn't it repeating correctly? I know regex can do repeated patterns. If I remove the barcode part it repeats ok, but then I only want it to function on barcode tags – Thomas Williams Sep 22 '19 at 20:37