0

hope you can help me solve a little problem.

i saw this post but still got some errors : How to convert HTML to JSON using PHP?

I created a PHP file which fetches a post from wordpress with this format

<h1><img src="category1.jpg" />Category 1</h1>
  <ul>
    <li>some text.<strong><em>AUTHOR 1</em></strong></li>
    <li>some other text.<strong><em>AUTHOR 2</em></strong></li>
    <li>some othe other text.<strong><em>AUTHOR 3</em></strong></li>
  </ul>
<h1><img src="category2.jpg" />Category 2</h1>
  <ul>
    <li>some new text.<strong><em>AUTHOR 4</em></strong></li>
    <li>some other new text.<strong><em>AUTHOR 5</em></strong></li>
    <li>some othe other new text.<strong><em>AUTHOR 6</em></strong></li>
  </ul>

what I am trying to achieve is json object looks like this:

[
  { 
    category: "Category 1", content: [
                                      {text: "some text.", author:"AUTHOR 1"},
                                      {text: "some other text.", author:"AUTHOR 2"},
                                      {text: "some other other text.", author:"AUTHOR 3"},
                                     ]
  },
  { 
    category: "Category 2", content: [
                                      {text: "some new text.", author:"AUTHOR 4"},
                                      {text: "some other new text.", author:"AUTHOR 5"},
                                      {text: "some other other new text.", author:"AUTHOR 6"},
                                     ]
  }

i need to use it for Angular modules afterwards.

is there any solution for this? any function?

many thanks!

Community
  • 1
  • 1
Dima Gimburg
  • 1,376
  • 3
  • 17
  • 35
  • There's nothing in the markup to suggest that a given note contains the author, or category information. The only thing we _could_ do is _Assume_ that the contents of a h1 tag is a category name, and the value of a `li > strong > em` is the author, but such assumptions are extremely risky. Also: show us what you've tried! – Elias Van Ootegem Oct 10 '14 at 13:41
  • thank you very much! let's say i can live with those assumptions, you say there is no PHP existing function that does something similar to the result i wanted? i have tried json_encode() on the very long string and seems that i get the same long string as an output. also i have tried what is suggested in the answers here: http://stackoverflow.com/questions/23062537/how-to-convert-html-to-json-using-php which got me a little step further but still got some errors of "invalid argument supplied for foreach()" – Dima Gimburg Oct 10 '14 at 13:53
  • I'd suggest looking into Regular Expressions to match the various patterns you are after. This would be a little fiddly, and involve various loops but should be doable. – Novocaine Oct 10 '14 at 13:57
  • yeah, thanks! i thought i could somehow avoid using it, but if i'll have to i'll use it. – Dima Gimburg Oct 10 '14 at 14:01
  • @Novocaine: ***Don't use regex on a DOM*** parse the markup, extract the nodes you need, and use their values when constructing an array, nice and easy _and_ reliable. – Elias Van Ootegem Oct 10 '14 at 14:06
  • @DimaGimburg: No, don't. Regex + markup = evil. look into `DOMDocument` and/or `SimpleXMLElement` (on php.net). parse the DOM, and set to work – Elias Van Ootegem Oct 10 '14 at 14:08
  • @EliasVanOotegem what's so wrong with regex on markup? Is it because it's not particularly reliable, or is that only one reason? – Novocaine Oct 10 '14 at 14:10
  • It's a simple matter of grammar hierarchy: Regex is (as the name suggests) restricted to _regular grammars_, markup is too complex for it: They're context-free grammars, so regex isn't the right tool for the job – Elias Van Ootegem Oct 10 '14 at 14:13
  • @eliasVanOotegem thank you very much! i think DOMDocument is what i searched for. – Dima Gimburg Oct 10 '14 at 14:16

1 Answers1

0

Take a look at the documentation for SimpleXML

Once you've converted the document to xml, you can treat it as an array and do a foreach on all elements. SimpleXML also allows you to treat elements as arrays. After separating the information, use json_encode() to build your JSON object. Below is an easy way to import the html code:

$doc = new DOMDocument ();

if (! $doc->loadHTML ( $html )) //this should contain the html code from the page

$page = simplexml_import_dom ( $doc );

//do something with $page

Alternatively, there are quite a few libraries and frameworks that do this for you. See this post for more info.

Community
  • 1
  • 1
frost287
  • 81
  • 4
  • @DimaGimburg: SimpleXML isn't always installed mind you, `DOMDocument` still is the extension with the widest support (last time I checked). Besides, it's really easy to do what you need to do here [Fully working example](https://eval.in/204181) – Elias Van Ootegem Oct 10 '14 at 14:22
  • thanks @EliasVanOotegem this is exactly what i talked about. of course ill manipulate it to work my way ;) – Dima Gimburg Oct 10 '14 at 14:26