1

I have seen in manuals how to use grep to match either a pattern or an exact string. However, I cannot figure out how to do both at the same time. I have a latex file where I want to find the following pattern:

\caption[SOME WORDS]

and replace it with:

\caption[\textit{SOME WORDS}]

I have tried with:

texfile <- sub('\\caption[','\\caption[\\textit ', texfile, fixed=TRUE)

but I do not know how to tell grep that there should be some text after the square bracket, and then a closed square bracket.

oguz ismail
  • 1
  • 16
  • 47
  • 69
gicanzo
  • 69
  • 8

1 Answers1

2

You can use

texfile <- "\\caption[SOME WORDS]" ## -> \caption[\textit{SOME WORDS}]
texfile <-gsub('(\\\\caption\\[)([^][]*)]','\\1\\\\textit{\\2}]', texfile)
cat(texfile)
## -> \caption[\textit{SOME WORDS}]

See the R demo online.

Details:

  • (\\caption\[) - Group 1 (\1 in the replacement pattern): a \caption[ string
  • ([^][]*) - Group 2 (\2 in the replacement pattern): any zero or more chars other than [ and ]
  • ] - a ] char.

Another solution based on a PCRE regex:

gsub('\\Q\\caption[\\E\\K([^][]*)]','\\\\textit{\\1}]', texfile, perl=TRUE)

See this R demo online. Details:

  • \Q - start "quoting", i.e. treating the patterns to the right as literal text
  • \caption[ - a literal fixed string
  • \E - stop quoting the pattern
  • \K - omit text matched so far
  • ([^][]*) - Group 1 (\1): any zero or more non-bracket chars
  • ] - a ] char.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks a lot! I still struggle to understand some parts of it. For example, in your first solution, why are you adding all those extra \\? Why two more before caption, and two more before the square bracket? Would you have a reference where I can learn more about all this? Eveything I've found so far gives only very basic introduction to grep – gicanzo Jun 09 '22 at 08:47
  • @gicanzo SO contains plenty information about this, e.g. see [R - gsub replacing backslashes](https://stackoverflow.com/a/27492072/3832970). In a regex, a backslash is a special char, and to match it as a literal char, you need to use a double backslash. Since text is encoded within string literals in R, and string literals support string escape sequences (like `\n` to encode a line feed char, or `\t` to define a tab char), the backslash is encoded with ``\\``. Thus, to match a backslash with a regex, in an R string literal, you need 4 backslashes. – Wiktor Stribiżew Jun 09 '22 at 08:51
  • @gicanzo As for `\[`, see [How do I deal with special characters like \^$.?*|+()[{ in my regex?](https://stackoverflow.com/a/27721009/3832970). `[` is a special regex metacharacter, same as a backslash, to match it as a literal `[` char, you need to escape it with a backslash. – Wiktor Stribiżew Jun 09 '22 at 08:52
  • That was my understanding, but I could not understand why the double ` \\` before `]`. You second link clarifies! Thanks a lot, again – gicanzo Jun 09 '22 at 09:16