0

I have a giant .txt file formatted as following (each non-blank line starts with triple whitespace):

   unwanted text
   unwanted text

   *wanted text
   abc
   def

   *wanted text 2
   content
   content

   *wanted text 3
   content
   content

   (...)

I'm looking for a code that returns me only the lines from the first " *" ocurrence until (but excluding) the second " *" ocurrence.

Surfing through multiple StackOverflow posts, i've managed to get the following working code, using Ubuntu (GNU/Linux):

sed -n -e '/^   \*/{p;q}' bigfile.txt && sed -e '1,/   \*/d' -e '/   \*/,$d' bigfile.txt

It gives me the following (as wanted) output:

*wanted text
abc
def
\n (representing a wanted blank line)

Though it's exactly the output I want, you have to agree with me, it's a kinda dumb code, since i have to use sed twice. First I had only the 2nd part of it (after "&&") and would return the right thing except for the first line (*wanted text). I've then appended this first part of code (before "&&") so I get also the first line of the wanted part. Every other piece of code I've tried didn't get me any better result.

It's never enough to say, it's a very big file, and I'll be doing this recursively in a script so, if possible, a /q (quitting after find the first result) is preferable.

After this is done, i need something that would take the result of the last command as the input, so i can get the exactly the whole text EXCEPT the prior result, like such:

   unwanted text
   unwanted text

   *wanted text 2
   content
   content

   *wanted text 3
   content
   content

   (...)

So, in summary, my 2 questions are:

  • Is there a way to get the 1st desired output as described above with a sed one-liner, without calling sed twice (and preferably quitting after finding the excerpt so it won't search through all the big file)? I'm pretty sure there's a more elegant solution.
  • How can i get as an output 'the whole text except for the result of the prior question' (like the 'reverse' output?)? I have no software requisites, I just need it so i can run the prior action again and again on and "ever-updating" input and handle each output of the 1st command according to specific conditions.

Hope i'm clear enough. Please ask me if any detail is missing. Thank you very much for your attention!

  • Possible duplicate of [How to select lines between two marker patterns which may occur multiple times with awk/sed](http://stackoverflow.com/questions/17988756/how-to-select-lines-between-two-marker-patterns-which-may-occur-multiple-times-w) – Andreas Louv Mar 17 '16 at 21:23
  • Possible duplicate of [find lines between two patterns using sed](http://stackoverflow.com/questions/14334032/find-lines-between-two-patterns-using-sed) – miken32 Mar 17 '16 at 22:36
  • This is a certainly a dupe. `sed -ne '/^ \*wanted text$/,/^$/ {p;}' foo.txt` – miken32 Mar 17 '16 at 22:37

1 Answers1

0

awk to the rescue!

$ awk '$1~/^*/{if(f) exit; f=1} f' file

   *wanted text
   abc
   def
   <-- here is the empty line formatter eats

for the second part

$ awk '$1~/^*/{f++} !f||f>1' file

   unwanted text
   unwanted text

   *wanted text 2
   content
   content

   *wanted text 3
   content
   content

   (...)
Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • Usually you can fix formatting with `
    ...
    ` but couldn't get it to work.
    – Andreas Louv Mar 17 '16 at 21:29
  • hey, this worked perfectly! thank you very much! it's a very elegant solution. i have noted that the "backslash" character won't appear on my computer, so it won't show you have to write "backslash""*" instead of simply "*", because sed will recognize the asterisk as a special character. – Miguel Prytoluk Mar 18 '16 at 02:07