1

I have got this string :

<form action="../?x=3O1*qY*E-dEItGGem1mH3VN5Nm6cO0hiQkOl0nSasIQqTDPzbSUbCI3UYWGGhwZ0" id="id8" method="post">

And i would like to get just the string inside the action attribute as follow :

../?x=3O1*qY*E-dEItGGem1mH3VN5Nm6cO0hiQkOl0nSasIQqTDPzbSUbCI3UYWGGhwZ0

I have tried many regexes, but they did not work.

preg_match('|<form action="../?x=(.+?)" id="id8" method="post">|', $output, $matches) 
Raphaël Vigée
  • 2,048
  • 14
  • 27
Tadeáš Jílek
  • 2,813
  • 2
  • 19
  • 32

3 Answers3

1

Does the string include the closing tag and the other necessary tags to make it proper HTML? If so, try loading it into DOMElement and operating on it from there, something like this:

$dom = new DomDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($html);
$forms = $dom->getElementsByTagName('form'); // Find Forms
foreach ($forms as $form){
    echo $form->getAttribute('action');
}
Mikel Bitson
  • 3,583
  • 1
  • 18
  • 23
1

Take a look at this post : Get substring between two strings PHP

For your special case, I would suggest you to do as follow :

function get_string_between($string, $start, $end){
    $string = ' ' . $string;
    $ini = strpos($string, $start);
    if ($ini == 0) return '';
    $ini += strlen($start);
    $len = strpos($string, $end, $ini) - $ini;
    return substr($string, $ini, $len);
}

$fullstring = '<form action="../?x=3O1*qY*E-dEItGGem1mH3VN5Nm6cO0hiQkOl0nSasIQqTDPzbSUbCI3UYWGGhwZ0" id="id8" method="post">';
$parsed = get_string_between($fullstring, 'action="', '"');

echo $parsed; // result

You can also use a DOMParser :

$html = '<form action="../?x=3O1*qY*E-dEItGGem1mH3VN5Nm6cO0hiQkOl0nSasIQqTDPzbSUbCI3UYWGGhwZ0" id="id8" method="post">';
$d = new DomDocument();
$d>loadHTML($html);
$forms = $d->getElementsByTagName('form');
foreach ($forms as $key => $f){
    echo $f->getAttribute('action');
}

EDIT : As suggested by Mikel Bitson, the DomParser method is cleaner and will work if there is more than one form.

Community
  • 1
  • 1
Raphaël Vigée
  • 2,048
  • 14
  • 27
  • This answer is solid as far as finding a string between two characters. That being said, what happens if there are more than one form elements in the HTML? Injecting this into a DOMDocument will allow you to operate on it as HTML elements. EDIT: Now I see you've added the DOM option. – Mikel Bitson Nov 03 '15 at 18:01
  • Edited ! Thanks for your suggest ! – Raphaël Vigée Nov 03 '15 at 18:05
1

Firstly, if you're parsing HTML, you can use the built in DOM parser as suggested by Mikel Bitson's answer.

The reason your |<form action="../?x=(.+?)" id="id8" method="post">| regex doesn't work is mainly that first ? which needs to be escaped. Yyou should really escape those wildcard matches too if you really want to match two periods.

That would give you something like |<form action="\.\./\?x=(.+?)" id="id8" method="post">|

or, if you simply want the entire URL, |<form action="([^"]+)" id="id8" method="post">|

Community
  • 1
  • 1
Paul Dixon
  • 295,876
  • 54
  • 310
  • 348