0

Angular/JS Application

I have this: input.replace('/<|>|"|&|'/gm', need this to be based on match value).

So I want to search by all those strings - but I want to replace the value based on which one matched. So if " matches = replace with " and if > matches = replace with >

I basically want to avoid this:

input.replace('/&lt;/gm', <)

input.replace('&gt;/gm', >)

input.replace('&quot;', ")

I think it has something to do with capturing groups - not a regex person.

Maybe the answer can only be: inputString.replace('/&lt;/gm', '<').replace('/&gt;/gm', '>').replace('/&quot;/gm', '"').replace('/&amp;/gm', '&').replace('/&apos;/gm', '\'');

RooksStrife
  • 1,647
  • 3
  • 22
  • 54
  • Your question is unclear. What do you mean by "< if #1"? What does "#1" refer to? – bgfvdu3w Jun 29 '22 at 17:44
  • @bgfvdu3w I want to search by all those strings - but I want to replace the value based on which one matched. So if " matches = replace with " and if > matches = replace with > – RooksStrife Jun 29 '22 at 17:45
  • Does this answer your question? [Unescape HTML entities in JavaScript?](https://stackoverflow.com/questions/1912501/unescape-html-entities-in-javascript) – bgfvdu3w Jun 29 '22 at 17:52
  • @bgfvdu3w no I need to specifically match certain strings and replace them with corresponding values. I update the question - it might help. – RooksStrife Jun 29 '22 at 17:56
  • Well, the strings you are trying to replace look like they are HTML entities. You're effectively trying to unescape/decode them. That's what I linked a solution to. Do you have other strings that are not HTML entities which need to be replaced too? – bgfvdu3w Jun 29 '22 at 17:58
  • @bgfvdu3w they will only be the ones from above. But I want to understand how I could do it with capturing groups (I think that's what it is called). Not just how to encode them. – RooksStrife Jun 29 '22 at 18:03

1 Answers1

1

What's commonly done is to simply chain the replacements, executing one after another as in your example:

input.replace(/&lt;/g, "<").replace(/&gt;/g, ">").replace(/&quot;/g, '"').replace(/&amp;/g, "&").replace(/&apos;/g, "'")

the downside of this it that it really doesn't scale well: Each replace operation runs in linear time. Thus for m replacement and a string of length n, the time complexity will be O(n * m). If you now were to implement support for all 2k+ named HTML entities, this would quickly blow up and your performance would degrade severely - not to mention the O(m) garbage strings that are created in the process, making for O(n * m) garbage data.

The proper way is to create a lookup table (a hash table, called a dictionary in JS) with O(1) access with all the named entities and their replacements:

const namedEntities = {lt: "<", gt: ">", quot: '"', amp: "&", apos: "'"}
return input.replace(/&(lt|gt|quot|amp|apos);/g, (_, match) => namedEntities[match])

this passes a replacement function to String.replace; no garbage strings are created and the time complexity - assuming an ideal RegEx implementation - is O(n).

If you want to religiously follow DRY, you might want to build the RegEx from the keys:

const regex = new RegExp("&(" + Object.keys(namedEntities).join("|") + ");", "g")
return input.replace(regex, (_, match) => namedEntities[match])

alternatively, consider using a more general RegEx, leveraging the dictionary to check whether an entity is valid and defaulting to no replacement:

return input.replace(/&(.+?);/g, (entity, match) => namedEntities[match] || entity)
Luatic
  • 8,513
  • 2
  • 13
  • 34