1

The input string has 2 scenarios mixed up. Example below.

Scenario 1:

/start/ sky is blue today; /transition/ it is raining; /end/

Scenario 2:

/start/ sky is blue today; /end/

In the input string, there are both scenarios 1 and 2. What I want to grab is:

  1. if /transition/ exist, then grab /start/ sky is blue today;
  2. if /transition/ does not exist, then grab /start/ sky is blue today; /end/.

Can you please help me with the regex expression?

Michael M.
  • 10,486
  • 9
  • 18
  • 34

1 Answers1

0

This works too:

(((\|start\|[^\;]*\; (?=\|transition\|[^\;]*\; \|end\|.*)))|((\|start\|[^\;]*\; \|end\|.*)))

Discussion

I think the generic form of your question is this:

  1. If there exists a string "${start}${transition}${end}"
  2. Where "start","transition", and "end" are variable strings with the format "tag content semicolon space"
  3. How does one conditionally grab parts of the string
  4. The conditions being: a) if transition tag exists return "$start" b) else return "${start}${end}"

Logic in regex can be accomplished by explicitly stating all acceptable scenarios, here's some bash to play around with our regex:

tst1="|start| example1; |transition| example2; |end| example3"
tst2="|start| example1; |end| example3"
tst3="|start| sky is blue today; |transition| it is raining; |end|"
tst4="|start| sky is blue today; it is raining; |end|"
tst5="|start| sky is blue today; |end|"

start='|start|[^\;]*\; '           # start marker, 0+ of any character but a semicolon, then a semicolon, then a space
start="${start//\|/\\|}"           # escape |'s
transition='|transition|[^\;]*\; ' # transition marker, 0+ of any character but a semicolon, then a semicolon, then a space
transition="${transition//\|/\\|}" # escape |'s
end='|end|.*'                      # end marker, 0+ of any character
end="${end//\|/\\|}"               # escape |'s

start_when_transition="(${start}(?=${transition}${end}))" # match start if transition and end
end_when_transition="(${start}${transition}\K${end})"     # match end if begining and transition
start_and_end="(${start}${end})"                          # match start and end when no transition in the middle
ifTransition="(${start_when_transition})"                            
else="(${start_and_end})"

echo tst1: $tst1
echo $tst1 | grep -oP "(${ifTransition}|${else})" | xargs echo -e "\t"
echo -----------------------------------------------------------------
echo tst2: $tst2
echo $tst2 | grep -oP "(${ifTransition}|${else})" | xargs echo -e "\t"
echo -----------------------------------------------------------------
echo tst3: $tst3
echo $tst3 | grep -oP "(${ifTransition}|${else})" | xargs echo -e "\t"
echo -----------------------------------------------------------------
echo tst4: $tst4 
echo $tst4 | grep -oP "(${ifTransition}|${else})" | xargs echo -e "\t"
echo -----------------------------------------------------------------
echo tst5: $tst5 
echo $tst5 | grep -oP "(${ifTransition}|${else})" | xargs echo -e "\t"

output:

tst1: |start| example1; |transition| example2; |end| example3
     |start| example1;
-----------------------------------------------------------------
tst2: |start| example1; |end| example3
     |start| example1; |end| example3
-----------------------------------------------------------------
tst3: |start| sky is blue today; |transition| it is raining; |end|
     |start| sky is blue today;
-----------------------------------------------------------------
tst4: |start| sky is blue today; it is raining; |end|
    
-----------------------------------------------------------------
tst5: |start| sky is blue today; |end|
     |start| sky is blue today; |end|

Bash reviewed

  • echo is a string printing program
  • echo -e allows for extended string stuff like "\t" for tab
  • grep is a string matching program
  • grep -oP -> -o is for --only-matching and -P is for Perl, an extended regex launguage
  • | aka "pipe", takes the output from the last command and feeds it into the next
  • xargs is a program takes its input and adds it as arguments to the following command
  • $variablename access variable we set
  • "${variablename}" access variable we set within a string

Regex reviewed

  • \K if you made it this far, great, but forget everything you just matched
  • ?= look ahead to see if somethings there but don't match
  • () scope conditions
  • | or
  • [] match any characters listed- character class
  • [^] match any characters but the ones listed
  • \ escape special character

Regex combinations reviewed

  • [abc]* - match a, b, or c 0+ times
  • foo(?=bar) match foo if bar comes right after

References

Barak Binyamin
  • 174
  • 1
  • 11