1

could you please give me some advice, I'm replacing the <chemform> code from my wiki which is not used any more... The strings are usually simple like these:

<chemform>CH3COO-</chemform>
<chemform>Ba2+</chemform>
<chemform>H2CO3</chemform>

I need them to be replaced by these:

CH<sub>3</sub>COO<sup>-</sup>
Ba<sub>2</sub><sup>+</sup>
H<sub>2</sub>CO<sub>3</sub>

So far I came up with this regexp for the RegExr tool:

match: <chemform\b[^>]*>(\D*?)([0-9]*)(\D*?)(\D*?)([0-9]*)(\D*?)([-+]*?)</chemform>

replace: $1<sub>$2</sub>$3$4<sub>$5</sub>$6<sup>$7</sup>

I know the code is horrible, but so far it's been working for me except for the fact it's getting me empty strings like <sub></sub>:

<sub></sub>CH<sub>3</sub>COO<sup>-</sup>
<sub></sub>Ba<sub>2</sub><sup>+</sup>
H<sub>2</sub>CO<sub>3</sub><sup></sup>

How can I get rid of these without doing second replace search? Thanks a lot!

Sidd Sidd
  • 95
  • 1
  • 1
  • 4

1 Answers1

0

You could use Notepad++, which is able to proceed to conditional replacements (you can have details in that previous post from Wiktor Stribiżew).

Use the following patterns:

  • match: ([A-Za-z]+(?=[-+\d]))(?<sub>\d+)?(?<sup>[-+])?(?=[-+\w]*</chemform>)
  • replace: $1(?{sub}<sub>$+{sub}</sub>)(?{sup}<sup>$+{sup}</sup>)

Given your input sample, I get:

<chemform>CH<sub>3</sub>COO<sup>-</sup></chemform>  
<chemform>Ba<sub>2</sub><sup>+</sup></chemform>  
<chemform>H<sub>2</sub>CO<sub>3</sub></chemform>
PJProudhon
  • 835
  • 15
  • 17