1

I have to find words.

In my assignment a word is defined as letters between two spaces (" bla "). I have to find a decimalIntegerConstant like this but it has to be a word.

I use

 grep -E -o " (0|[1-9]+[0-9]*)([Ll]?) "

but it doesn't work on, for example:

bla 0l labl 2 3 abla0La 0L sfdgpočítačsd

Output is

0l
2 
0L 

but 3 is missing.

123
  • 10,778
  • 2
  • 22
  • 45
  • 3
    You are consuming the space when you match `2`. Try `grep -oP " (0|[1-9]+[0-9]*)[Ll]?(?= )"` – 123 Dec 07 '16 at 10:17
  • @123 In order to take beggining/end of lines into account and to remove spaces from the output, you could improve your regex like so : `grep -oP "(^| )\K((0|[1-9]+[0-9]*)[Ll]?)(?= |$)"` – Aserre Dec 07 '16 at 10:27
  • @Aserre I purposefully didn't over complicate it as OP defined words as between two spaces but yes you could. – 123 Dec 07 '16 at 10:30
  • I use `grep -E -o "\b(0|[1-9]+[0-9]*)([Ll]?)\b"`. Someone write it here but then delete post. Also your solutions work, but i don't understand them, so i rather use `\b` if it is ok. – Vladimír Fencák Dec 07 '16 at 10:35
  • `\b` is a word boundary, which includes spaces but also punctuation and line ends. So it will match all numbers in lines like `1,2,3`. – n. m. could be an AI Dec 07 '16 at 10:38
  • `grep -oP " (0|[1-9]+[0-9]*)[Ll]?(?= )"` this combined with `\K` is ok. I dont need `grep -oP "(^| )\K((0|[1-9]+[0-9]*)[Ll]?)(?= |$)"` because it has to be letters between two spaces (ASCII32). Thank you for help. – Vladimír Fencák Dec 07 '16 at 10:58

1 Answers1

1

Matches don't overlap. Your regex have matched 2. The blank after 2 is gone. It won't be considered for further matches.

POSIX grep cannot do what you want in one step, but you can do something like this in two stages (simplified from your regex, doesn't support [lL])

grep -o ' [0-9 ]* ' | grep -E -o '[0-9]+'

That is, match a sequence of space-separated numbers with leading and trailing spaces, and from that, match individual numbers regardless of spaces. De-simplify the definition of number to suit your needs.

Perl-compatible regular expressions have a way to match stuff without consuming it, for example, as mentioned in the comments:

grep -oP " (0|[1-9]+[0-9]*)[Ll]?(?= )"

(?= ) is a lookahead assertion, which means grep will look ahead in the input stream and make sure the match is followed by a space. The space will not be considered a part of the match and will not be consumed. When no space is found, the match fails.

PCRE are not guaranteed to work in all implementations of grep.

Edit: -o is not specified by Posix either.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243