1

I have a file input.txt with the following content:

foo
[assembly: AssemblyVersion("1.2.3")]
bar")]
quux

To match the 1.2.3 from the input the following script is used:

#!/bin/bash
regex='\[assembly: AssemblyVersion\("(.*)"\)\]'
fileContent=$(cat input.txt)
[[ "$fileContent" =~ $regex ]]
echo "${BASH_REMATCH[1]}"

I would expect the output to be 1.2.3 but it is:

1.2.3")]
bar

Why is that so? How to fix it?

The regular expressions tester at https://regex101.com works as expected.

alvarez
  • 700
  • 7
  • 19

1 Answers1

4

The .* is called a greedy dot matching subpattern and it matches ", and ), any character including a newline.

Thus, the best trick to limit the greediness is using a negated character class [^"] that will match any character but " (if there can be no quotes inside the quoted string):

'\[assembly: AssemblyVersion\("([^"]*)"\)\]'
                                ^^^^^ 

Demo

or - if there should be no ( and ) inside the quoted string:

'\[assembly: AssemblyVersion\("([^()]*)"\)\]'
                                ^^^^^  

Demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Actually, both the solutions here assume there are just spaces, basic punctuation and word characters inside the quoted string. If there can be escaped entities, the regex will be a bit more complicated: [`'\[assembly: AssemblyVersion\("([^\\"]*(\\.[^\\"]*)*)"\)\]'`](http://ideone.com/luGBKQ). – Wiktor Stribiżew Dec 10 '15 at 11:43
  • Actually, the `.` *does* match a newline; that's how the greediness of the `*` causes a problem. – chepner Dec 10 '15 at 12:50
  • @chepner: True, I typed that automatically without thinking. [*In those cases where there is a newline in a multiple line expression, the dot will match the newline.*](http://tldp.org/LDP/abs/html/x17129.html#FTN.AEN17189) – Wiktor Stribiżew Dec 10 '15 at 13:03
  • Those links are irrelevant to the regular expressions used by `bash`. `bash` itself doesn't define its regular expression format; it uses the local implementation of `regex(3)`. (See `man 7 re_format` for details.) – chepner Dec 10 '15 at 13:09
  • That help does not specifically state that a dot can match a newline, it just says *`.` (matching any single character)*. – Wiktor Stribiżew Dec 10 '15 at 13:11