0

I have access to a large amount of HTML inside a single string:

const { body_html } = this.props.product_page;

I am attempting to update this HTML using only string parsing. Specifically, I want to find the first div closing element after a specific substring is found:

product.description.

The challenge is the dynamic nature of product_page. There will be an unknown amount of characters between the first closing div </div> and the end of the substring product.description

How can I inject <div>Hello, world!</div> after the first closing div -- after finding the product description variable?

EDIT: I know it's poor practise to modify HTML in such a fashion, but due to technical constraints, these are the conditions I have to satisfy. Also, this is not pure HTML code, but liquid code actually (embedded Ruby templates). Finally, I never asked for regex specifically. Can't indexOf with substrings be enough (or is that technically the same thing)?

ilrein
  • 3,833
  • 4
  • 31
  • 48
  • 1
    Usingi regex here sounds like a good way to get burned, or at least to have bugs down the road. Instead, I recommed using an HTML parser instead. – Tim Biegeleisen Jul 18 '17 at 05:41
  • 2
    [Obligatory link.](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454) By far, the best way to do this is to parse the HTML properly: With an HTML parser. There's one built into the browser, after all. – T.J. Crowder Jul 18 '17 at 05:41

3 Answers3

1

Obligatory link. By far, the best way to do this is to parse the HTML properly: With an HTML parser. There's one built into the browser, after all. If you try to do this with simplistic string processing, the odds are it will bite you.

Can't indexOf with substrings be enough (or is that technically the same thing)?

Not quite. Officially, end tag for a div could be </div> or </div > (where that space could be any number of whitespace, including newlines, tabs, etc.). In practice, browsers tolerate whitespace between the / and div as well.

So you'll want a regular expression to find the end tag. Something like:

var str = "testing product.description }}\n</div\n\t >";
var match = /(product\.description[\s\S]*?)<\/\s*div\s*>/.exec(str);
console.log("Original string: " + str);
if (match) {
  var index = match.index + match[1].length;
  console.log("It's at index " + index);
  str = str.substring(0, index) +
        "<div>Hello, world!</div>" +
        str.substring(index);
  console.log("New string: " + str);
} else {
  console.log("Not found");
}
.as-console-wrapper {
  max-height: 100% !important;
}

That regex allows for whitespace in the closing </div> tag and gives you the length of the part of the match prior to it, so you can insert the string.

One slightly tricky bit of that is the [\s\S]*? part, which is basically .*? (optionally match any number of any characters) but it includes newlines, which . doesn't. ([\s\S] means "any whitespace or non-whitespace character.)

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • How would you extend your answer to support handlebars? ```{{ product.description }}``` ? – ilrein Jul 18 '17 at 19:39
  • @ilrein: The above works just fine if `product.description` is in `{{ }}` – T.J. Crowder Jul 19 '17 at 06:57
  • Ultimately settled with ```/(product\.description[^<]*)<\/\s*div\s*>/``` (the regex failed with a newline -- works finally!) – ilrein Jul 19 '17 at 14:32
  • I only modified it because I was unable to get matches with newlines. Try it out (HTML string: https://pastebin.com/raw/enH4wKpd) – ilrein Jul 19 '17 at 16:53
  • Final iteration I think: ```/(product\.description[\S\s]*?<\/\s*div\s*>)/``` – ilrein Jul 19 '17 at 17:06
  • @ilrein: Ah, a newline between `description` and ``. Fixed above. `.*?` won't match newlines; `[\s\S]*?` will. – T.J. Crowder Jul 19 '17 at 18:18
0

Here is a small idea for you to work.

lets say you have the following html string

var myStr = "<div><span>Hello<span><div>SearchString<div>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</div></div></div>";

and your search string is

var searchString = "SearchString";

Now first find the index of this string using myStr.indexOf() and use it to substring till the end and find the nearest occurrence of ''

myStr.substr(myStr.indexOf(searchString),myStr.length).indexOf('</div>')

now you have the index where you have to insert your string. insert it there and you are good to go

here is a jsfiddle for you

Strikers
  • 4,698
  • 2
  • 28
  • 58
0

First of all, this is not a good practice. You should definitely try with HTML parser.

But just to answer your question, below is the sample code for the same

var mySearchStr = "<div> Test String 1 </div><div> myString is this </div><div> New string should be before this </div>";

var searchStrIndex = mySearchStr.indexOf("myString");

var closingDivIndex = mySearchStr.indexOf("</div>", searchStrIndex + 1); // Div after the first occurence of search string

var firstPart = mySearchStr.substring(0, closingDivIndex + 6);  // 6 is the length of </div>

var secondPart = mySearchStr.substring(closingDivIndex + 6);

var finalString = firstPart + "<div> My New content </div>" + secondPart;

alert(finalString);

There may be better ways out there using regex. But I am not an expert there.

Plunker

Akhil
  • 2,602
  • 23
  • 36