-3

Sorry for my bad english. I have got a text such as below

<title class="a" />
<li name="a1" src="a11" />
<li name="a2" src="a21" />
<li name="a3" src="a31" />
<title class="b" />
<li name="b1" src="b11" />
<li name="b2" src="b21" />
<title class="c" />
<li name="c1" src="c11" />
<li name="c2" src="c21" />
<li name="c3" src="c31" />
<li name="c4" src="c41" />
<li name="c5" src="c51" />

i want to get all title class name and child li (no specific count of child li) name and src values.

Thanks inadvance.

MGR
  • 167
  • 1
  • 11

1 Answers1

0

Grab title and child li's:

/<title.*?class=(['"])(.*?)\1 \/>(?:\n<li.*(?:name=(["']).*?\3.*|src=(['"]).*\4.*){2})+/g

Grab li's from within this:

/(?:\n<li.*name=(["'])(.*?)\1.*src=(['"])(.*?)\3.*?)/g

Please note that this is considered bad practice, and you probably shouldn't parse html with regex

let titleAndListElementsRegex = /<title.*?class=(['"])(.*?)\1 \/>(?:\n<li.*(?:name=(["']).*?\3.*|src=(['"]).*\4.*){2})+/g,
  listElementRegex = /(?:\n<li.*name=(["'])(.*?)\1.*src=(['"])(.*?)\3.*?)/g,
  page = `<title class="a" />
<li name="a1" src="a11" />
<li name="a2" src="a21" />
<li name="a3" src="a31" />
<title class="b" />
<li name="b1" src="b11" />
<li name="b2" src="b21" />
<title class="c" />
<li name="c1" src="c11" />
<li name="c2" src="c21" />
<li name="c3" src="c31" />
<li name="c4" src="c41" />
<li name="c5" src="c51" />`,
  pageJson = {};

do {
  titleMatch = titleAndListElementsRegex.exec(page);
  if (titleMatch) {
    pageJson[titleMatch[2]] = {}
    do {
      listItemMatch = listElementRegex.exec(titleMatch[0]);
      
      if (listItemMatch) {
        pageJson[titleMatch[2]][listItemMatch[2]] = listItemMatch[4]
      }
    } while (listItemMatch);
  }
} while (titleMatch);

console.log(pageJson)
KyleFairns
  • 2,947
  • 1
  • 15
  • 35