0

I have hundreds of strings, in which I conditionally need to add or remove something. To add a bit more context, this is in fact stringified SVGs, where I want to add/remove attributes to certain nodes. Since I don't have the luxury of a DOM at any stage of this process, and both the input and desired output is a string, I want to do this with a string replacement using RegEx.

The logic is: any substring which looks like an SVG element (path, circle etc.). which doesn't have a stroke attribute needs to have it added with a value of none.

Example input:

<svg>
  <path d="..." fill="black" />
  <path d="..." stroke="black" />
  <circle cx="50" cy="50" r="50" fill="black" />
</svg>

Desired output:

<svg>
  <path d="..." fill="red" stroke="none" />
  <path d="..." stroke="black" />
  <circle cx="50" cy="50" r="50" fill="black" stroke="none" />
</svg>

Note that the first <path> and the <circle> has a stroke="none" added, while the other path remains unchanged.

What I've tried so far:

const matcher = /(?<=path)((?!stroke=).)*(?=\/>)/g
const newSvg = oldSvg.replace(matcher, ' $1 stroke=\"none\" ')

Problems

  • I think this is really close , but the output is wrong: <path " stroke="none" /> (all attributes removed and a " character is added).
  • This only matches path elements. I tried something like (?<=(path|circle)), but it seems not to be valid.

What am I doing wrong?

Nix
  • 5,746
  • 4
  • 30
  • 51
  • 2
    Why does this need to be done with regex? You are using JS already, it seems it would be much easier to examine the content as a document programmatically. – VLAZ Jan 19 '23 at 15:17
  • While it is JavaScript, there is no DOM. It's a Node script which reads the file content (as string) and feeds it inte SVGR (as a string). SVGR then outputs a React component, and I bundle that as a package. The alteration needs to happen before it is fed into SVGR, and at that point it is a string. – Nix Jan 19 '23 at 15:49
  • 1
    See https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags and then decide what you're going to use instead of regex – Robert Longson Jan 19 '23 at 15:50
  • You could also try a [nodeJS DOM parser: "How do I parse a HTML page with Node.js"](https://stackoverflow.com/questions/7372972/how-do-i-parse-a-html-page-with-node-js) Other options described [here: "Trying to use the DOMParser with node js"](https://stackoverflow.com/questions/11398419/trying-to-use-the-domparser-with-node-js) – herrstrietzel Jan 19 '23 at 17:29

1 Answers1

1

It is safer to use a DOM parser in Node.js, see How do I parse a HTML page with Node.js

You can however use nested regex replaces. Here is a solution to your question that supports the shortened tag format <tag attrs />, as well as the close tag format <tag attrs></tag>. Tweak regex2 to support additional tags.

const input = `<svg>
  <path d="..." fill="black" />
  <path d="..." stroke="black" />
  <circle cx="50" cy="50" r="50" fill="black" />
</svg>
<div>Other stuff</div>
<svg>
  <path d="..." fill="black" stroke="green"></path>
  <path d="..."></path>
  <circle cx="50" cy="50" r="50" fill="black"></circle>
</svg>`;

const regex1 = /(<svg>)(.*?)(<\/svg>)/gis;
const regex2 = /<(path|circle)( .*?)( *\/>|> *<\/\1>)/gis;
const regex3 = /\bstroke=["']/i;
let result = input.replace(regex1, (svg, g1, g2, g3) => {
  return g1 + g2.replace(regex2, (tag, g1, g2, g3) => {
    if(!g2.match(regex3)) {
      return '<' + g1 + g2 + ' stroke="none"' + g3;
    } else {
      return tag;
    }
  }) + g3;
});
console.log(result);

Output:

<svg>
  <path d="..." fill="black" stroke="none" />
  <path d="..." stroke="black" />
  <circle cx="50" cy="50" r="50" fill="black" stroke="none" />
</svg>
<div>Other stuff</div>
<svg>
  <path d="..." fill="black" stroke="green"></path>
  <path d="..." stroke="none"></path>
  <circle cx="50" cy="50" r="50" fill="black" stroke="none"></circle>
</svg>

Explanation of regex1:

  • (<svg>) -- capture group 1 with opening tag
  • (.*?) -- capture group 2: non-greedy scan until:
  • (<\/svg>) -- capture group 3 with closing tag

Explanation of regex2:

  • < -- literal <
  • (path|circle) -- capture group 1 with tag name
  • ( .*?) -- capture group 2: tag attributes, e.g. non-greedy scan until:
  • ( *\/>|> *<\/\1>) -- capture group 3: either self closing tag /> or > and closing tag </tag>

Explanation of regex3 test:

  • \b -- word boundary
  • stroke= -- literal text
  • ["'] -- quote or single quote char
Peter Thoeny
  • 7,379
  • 1
  • 10
  • 20