0

I have a text file with following content in it (for example):

In first line the "One", second"Two " & " Three " and also"Four    ". 
In second line also nested "foo "bar" baz""zoo" patterns.

I tried to had all strings between a pair of quotes and finally I ended up by this command:

grep -Po '"\K[^"]+"' file

What this command gave me is as following:

One"
Two "
 Three "
Four    "
foo "
 baz"
zoo"

And what I want from above result as my desired output would be:

One
Two 
 Three 
Four    
foo 
 baz
zoo

Please someone help me to remove the last " from the above grep output. I don't want to remove spaces from the output. I don't have any words which expanded to multiline. e.g:

... "foo "bar" ba
z""zoo" ...

Please, please don't suggest me I can use multiple commands, I know I can. I'm ask you if I can do it with grep and its options alone?

αғsнιη
  • 2,627
  • 2
  • 25
  • 38
  • But regular expression matches are greedy so just searching for [^"]+ without the trailing " should be enough shouldn't it? – Jerry Jeremiah Nov 13 '14 at 09:39

2 Answers2

2

This could be possible through the below grep one-liner.

$ grep -oP '"\K[^"]+(?="(?:[^"]*"[^"]*")*[^"]*$)' file
One
Two 
 Three 
Four    
foo 
 baz
zoo

Another hacky one through PCRE verb (*SKIP)(*F),

$ grep -oP '[^"]+(?=(?:"[^"]*"[^"]*)*[^"]*$)(*SKIP)(*F)|[^"]+' file
One
Two 
 Three 
Four    
foo 
 baz
zoo
Community
  • 1
  • 1
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

Here is an awk if you can not solve it with grep

awk -F\" '{for (i=2;i<=NF;i+=2) {gsub(/ /,"");print $i}}' file
One
Two
Three
Four
foo
baz
zoo

awk -F\" '{for (i=2;i<=NF;i+=2) print $i}'
One
Two
 Three
Four
foo
 baz
zoo
Jotne
  • 40,548
  • 12
  • 51
  • 55