0

Let's say I have this HTML, I cannot perform regex on it to find value (at least I think so) because it has new lines but I need to search something like name="hf" value="(.*?)".

HTML Response

  <input type="hidden"
         name="hf"
         value="123">

When I try to do $response = str_replace('\r\n', $response, ''); or $response = str_replace('\n', $response, ''); $response becomes empty string. What are my options?

Stan
  • 25,744
  • 53
  • 164
  • 242
  • 1
    Use a proper HTML parser. I am partial to [DOMDocument](http://php.net/manual/class.domdocument.php) for HTML. – Michael Berkowski Mar 07 '12 at 13:39
  • [How to parse and process HTML with PHP?](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php) – stema Mar 07 '12 at 13:40

3 Answers3

1

Ok, for a start you're passing parameters to str_replace in the wrong order.

str_replace($search, $replace, $subject)

Your subject is '', and you are replacing '\n' (which doesn't exist) with your response. So the result is nothing.

Second of all '\n' will not give you newline. You need to use double quotes in order to process the escape character.

str_replace("\n", '', $response);

This fixes your original code.

Lastly you should be using DOMDocument to process HTML anyway, not regex. Get into the (good) habit of doing it properly, and it will save you time and trouble in the long run.

The question How to parse and process HTML with PHP? is very comprehensive on the subject.

Grabbing the href attribute of an A element also provides some nie code examples.

Community
  • 1
  • 1
Leigh
  • 12,859
  • 3
  • 39
  • 60
0

Regular expressions have modifiers you can use - there's "m" and "s" that tell it to parse it as a multi-line string or as a single line. That second one may be a better option for you like:

preg_match('/name="hf" value="(.*?)"/s',$htmlString,$matches);
enygma
  • 684
  • 4
  • 6
  • This will not help th OP. `s` is making the dot also match newline characters, but since there are no newlines in the "value" part ==> no match. You would need to allow newlines between "hf" and "value", but there is only a space ... – stema Mar 07 '12 at 13:48
0

It is highly recommended to use DOM parsing rather than the error prone regex to get the solution of this kind of HTML parsing.

Here is DOM based code that you can use to extract your input item's value:

$html = <<< EOF
<input type="hidden"
 name="hf"
 value="123">
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
echo $doc->saveHTML();
$xpath = new DOMXPath($doc);
// returns a list of all inputs with name='hf'
$nodelist = $xpath->query("//input[@name='hf']");

for($i=0; $i < $nodelist->length; $i++) {
    $node = $nodelist->item($i);
    $value = $node->attributes->getNamedItem('value')->nodeValue;
    var_dump($value); // prints "123"
}
anubhava
  • 761,203
  • 64
  • 569
  • 643