0

Let's say I've got a specific file looking like that:

#tata toto
tata titi
tata tutu titi
#tata titi
tata toto #ZZZ
tata toto   #ZZZ
#tata toto  #ZZZ
tata titi   #YYY
#tata titi #YYY
tata titi toto

Ans I want to match every line:

  • starting with tata
  • capture if toto is present or not

For example if

tata titi => \1=tata \2=" titi" \3=null \4=null
tata titi toto => \1=tata, \2=" titi ", \3=toto, \4=null
tata toto tutu => \1=tata, \2="  ", \3=toto, \4=" tutu"

I've tryed this regex: ^(tata)(.*)(toto)?(.*)

But the .*is capturing more than expected. So toto is never captured.

How would you do that?

To gives more context, I want to parse an /etc/hosts: if I found a specific IP (here tata), but this line does not contains an hostname alias (here toto), we add it, conserving all hostname and hostname alias already defined, and the comment.

Thanks, Raoul

anubhava
  • 761,203
  • 64
  • 569
  • 643
Raoul Debaze
  • 466
  • 1
  • 8
  • 24
  • Please read the regex tag description. – Casimir et Hippolyte Nov 08 '19 at 21:23
  • 1
    Sounds like you're looking for [tempered greed](https://www.rexegg.com/regex-quantifiers.html#tempered_greed), how about [`^(tata)(?:(?!toto).)*(toto)?`](https://regex101.com/r/qhleQI/1) – bobble bubble Nov 08 '19 at 21:30
  • It sill does not matched exactle my need. Here for line `tata titi` I would like to have as result `\1=tata \2=null \3="titi" \4=null`, but here I've got `\1=tata \2=null \3=null \4=null` – Raoul Debaze Nov 08 '19 at 22:00
  • But you wrote `tata titi => \1=tata \2=null \3=null` in question and now writing `I would like to have as result \1=tata \2=null \3="titi" \4=null` – anubhava Nov 08 '19 at 22:03
  • Right, sorry, my bads, I'm correcting it. – Raoul Debaze Nov 08 '19 at 22:09
  • You can do this more easily without using regular expressions. Are using regular expressions a requirement? – Booboo Nov 08 '19 at 22:49
  • Yes, this is for using inline module of ansible. To set up my remote servers /etc/hosts if not well configured. – Raoul Debaze Nov 08 '19 at 23:12

2 Answers2

3

You may use this regex with optional matches and a negative lookahead:

^(tata)( +(?:(?!toto)\S+ *|))(toto|)(.*)$

RegEx Demo

RegEx Details:

  • ^: Start
  • (tata): Match & capture tata in group #1
  • (: Start capture group #2
    • \ +: Match 1+ spaces
    • (?:: Start non-capture group
      • (?!toto): If we don't have toto at next position
      • \S+ *: Match 1+ non-space characters followed by 0 or more spaces
      • |: OR nothing
    • ): End non-capture group
  • ): End capture group #2
  • (toto|): Capture group #3 that matches toto or nothing
  • (.*): Capture group #4 that matches remaining characters till end
  • $: End
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    I'm impressed. That's do what I want, but I did not realize it would be so complicated. One more thing, is it possible to do something like `if \2==null then " " else \2` ? – Raoul Debaze Nov 08 '19 at 23:09
  • `\2==null then " " else \2` is only possible in post-processing in whatever tool/language you are using for running this regex. Regex engine can conditionally capture text from original string. Replacements are done by your platform. – anubhava Nov 09 '19 at 06:20
0

By default, the asterisk is greedy, which means it'll consume as much as possible. Try using .*? to make it "lazy".

Andrew
  • 763
  • 7
  • 21
  • Effectively, doing `^(tata)(.+?)(toto)?(.*)` is ok for all the lines, except the last one. For the last one, I've still have: `\1=tata \2=" " \3=null and \4=titi toto` But I would like `\1=tata \2=" titi " \3=toto \4=null` – Raoul Debaze Nov 08 '19 at 21:12
  • I see. I will keep trying and let you know when I update this – Andrew Nov 08 '19 at 21:18