1

I have one project where I have to deal with an API provided by an online marketing platform in order to signup users and other typical (and here irrelevant) actions.

Today they sent me the API documentation and I'm confused because it specifies literally "... it returns an HTML that returns a JSON...". And it shows an example like this:

<html> 
<head> 
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
    <meta name="XX:pagetag" content="" /> 
    <meta name="XX:flowtag" content="" /> 
</head> 
<body leftmargin='0' rightmargin='0' topmargin='0' bottommargin='0' bgcolor='#FFFFFF'> 
    <RESULT>[{ &quot;userid&quot;: 1, &quot;status&quot;: &quot;OK&quot; }]</RESULT> 
</body> 
</html> 

So I have two questions:

  1. Why would anyone build an API responding this way? It really doesn't make sense to me. In what way is this better than returning just the JSON with proper JSON headers? Maybe I've been missing something...

  2. Using PHP, which is the best way to parse this response and get the actual JSON? I've read multiple answers and comments here in SO discouraging the use of Regular Expressions to parse HTML but so far it's the only way I can think of. I've tried:

    // This will be a CURL call:
    $response = file_get_contents('./example-response.html');
    
    $regexp = '/<RESULT>(.*)<\/RESULT>/';
    $matches = [];
    preg_match($regexp, $response, $matches);
    print_r($matches);
    

Btw, this code doesn't work in my MAMP Server but it does in https://www.functions-online.com/preg_match.html

Any suggestion will be much appreciated. Thank you!

Community
  • 1
  • 1
Jordi Nebot
  • 3,355
  • 3
  • 27
  • 56

1 Answers1

0
  1. You are totally right, this seems plain crazy.
  2. You can use PHP's SimpleXMLElement and it's xpath-method to parse HTML. Don't forget to unescape those htmlentities too :)
Fivetide
  • 161
  • 5