0

I am beginner at regex and now I am trying to solve the following example:

I have this file below and I would like to match only the names.

 @misc{diaz2006automatic,
  title={AUTOMATIC ROCKING DEVICE},
  author={Diaz, Navarro David and Gines, Rodriguez Noe},
  year={2006},
}

@article{gentsch1992identification,
  author={GenTSCH, JoN R and Glass, RI and Woods, P and Gouvea, V and Gorziglia} 

I have created this regex : (?<=author=\{).*[a-z](?=\}) but I am not able to remove "and" from the list of names.

Please, could you give my some advice? Thank you very much.

FDv
  • 21
  • 1
  • 1
    It will be easier to remove `and` after you collect your matches using string programming language means. – Wiktor Stribiżew Nov 11 '18 at 20:39
  • Unfortunately, I cannot use any programming language because this example is for regex only. I started doing some online training and I got stuck on this. I am not asking for solution just for some hint :) – FDv Nov 11 '18 at 21:03
  • The [a-z] character class isn't adding anything except to say the last character before the closing brace has to be a letter. Is that on purpose? – ScottBro Nov 11 '18 at 21:13
  • Yes. I wanted to delimit space with names between "author=" and } brace. – FDv Nov 11 '18 at 21:20
  • What is the regex library/flavor? – Wiktor Stribiżew Nov 11 '18 at 21:29
  • What are the expected matches? Try `(?:author=\{|\G(?!\A)\s*,?\s*(?:and\s+)?)\K[a-zA-Z]+`, see [demo](https://regex101.com/r/jsZkeO/1). – Wiktor Stribiżew Nov 12 '18 at 22:13

1 Answers1

0

The only way I found to do it in "one command" is the following line:

sed -nr -e '/^\s*author=/ s/(\s*author=\{|\}|\s+and\s+|,)//pg'

Result:

Diaz Navarro DavidGines Rodriguez Noe
GenTSCH JoN RGlass RIWoods PGouvea VGorziglia 

Here, by "one command" I mean one expression (the whole thing between simple quotes). Of course, this point of view can be disputed.

Explanation:

  • -n : don't output the lines
  • -r : use extended regexps
  • -e : our expression:
    • /^\s*author=/ [...] Perform the next operation only if the line matches this regexp.
    • s/(\s*author=\{|\}|\s+and\s+|,)//pg The said expression. Removes what we don't want. A list of patterns separated by a pipe |. They can be viewed as "separators" for easy understanding.
Amessihel
  • 5,891
  • 3
  • 16
  • 40