3

I'm trying to add a special character to a series of values. But I don't know how.

Here is the original input:

chemical <- "200mL of Ac2O3, 3.5mml of AgBF4, 10.0ml of AgBr, 100ml of AgCl3Cu2"

And I want:

"200mL of Ac~2~O~3~, 3.5mml of AgBF~4~, 10.0ml of AgBr, 100ml of AgCl~3~Cu~2~"

Basically, I am adding a "~" before and after anytime there is a number in the chemical formula in the original data.

I was trying to use gsub but I am not sure how I am supposed to tell R to find just those numbers in a chemical formula and then do the insertion.

Does anyone have a thought on this? Thank you!

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
Connie
  • 89
  • 1
  • 1
  • 8

4 Answers4

4
gsub("(?<=[A-Za-z])([0-9])","~\\1~",chemical,perl = T)
[1] "200mL of Ac~2~O~3~, 3.5mml of AgBF~4~, 10.0ml of AgBr, 100ml of AgCl~3~Cu~2~"

Here you need to use the positive lookback syntax ?<= to specify that you want your numbers to be preceded by letters, upper case or lower case [A-z]. You use parentheses for the number to make a capture group, that you call with \1, ecsaped with \ in your replacement: ~\\1~. The perl = T is there to allow for the positive lookback syntax

denis
  • 5,580
  • 1
  • 13
  • 40
2

Similar to @denis answer but without using the perl syntax:

gsub("([A-Za-z])([0-9]+)","\\1~\\2~",chemical)

(corrected per @Wiktor correct comment)

GordonShumway
  • 1,980
  • 13
  • 19
2

This succeeds. Whether it will deliver from a more varied case might be an issue:

gsub("([^ [:digit:].])([[:digit:]])", "\\1~\\2~", chemical)
#[1] "200mL of Ac~2~O~3~, 3.5mml of AgBF~4~, 10.0ml of AgBr, 100ml of AgCl~3~Cu~2~"

Logic is to match a pairing of a {non-digit,non-space, non-decimal point} character followed by a digit and put a tilde flanking hte digit. If the size of the "number" could ever exceeds 9 then you would want to put a quantified after the digit: "[[:digit:]]{1, 30}" perhaps.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

An option is to strsplit on space and then replace digits in words starting with character with ~\\1~:

chemical <- "200mL of Ac2O3, 3.5mml of AgBF4, 10.0ml of AgBr, 100ml of AgCl3Cu2"


a <- strsplit(chemical, split = " ")[[1]]

paste0(ifelse(grepl("^[a-zA-Z].*", a),gsub("(\\d)","~\\1~", a),a),collapse = " ")
#[1] "200mL of Ac~2~O~3~, 3.5mml of AgBF~4~, 10.0ml of AgBr, 100ml of AgCl~3~Cu~2~"
MKR
  • 19,739
  • 4
  • 23
  • 33
  • 3
    that's a complicated way to do positive lookback, but it's clever – denis Apr 26 '18 at 21:45
  • @denis I know. You were quick in providing solution with forward lookup. Hence I had to change my solution. You got an smart answer. – MKR Apr 26 '18 at 21:48