Replying to emails: how to condense multiple "blank" (not really blank; lines consisting only of ">") lines into one?

Question

I'm trying to do something like this but for quoted emails, so this

On 2014-07-11 at 03:36 PM, <ilovespaces@email.com> wrote:                                                                                                                                                                                                                                                       
>Hi Everyone,                                                                                                                                                                                                                                                                                                                 
>                                                                                                                                                                                                                                                                                                                             
>                                                                                                                                                                                                                                                                                                                              
>                                                    
>I love spaces.
>                                                                                                                                                                                                                                                                                                                             
>                                                                                                                                                                                                                                                                                                                          
>                                                                                                                                                                                                                                                                                                                          
>That's all.

Would become this

On 2014-07-11 at 03:36 PM, <ilovespaces@email.com> wrote:                                                                                                                                                                                                                                                       
>Hi Everyone,                                                                                                                                                                                                                                                                                                                 
>                                                                                                                                                                                                                                                                                                                             
>I love spaces.
>                                                                                                                                                                                                                                                                     
>That's all.

Thanks

score 14 · Answer 1 · answered Jul 16 '14 at 08:14

14

Assuming that each visual line is a proper logical line (string of characters ended with a \n), you can dispense with the rest of the tools and simply run uniq(1) on the input.

Example follows.

% cat tst
>Hi Everyone,
>
>
>
>I love spaces.
>
>
>
>That's all.

% uniq tst
>Hi Everyone,
>
>I love spaces.
>
>That's all.
%

answered Jul 16 '14 at 08:14

Noufal Ibrahim

71,383
13
135
169

2

Thanks. One of the reasons why I emphasise UNIX 101 in all the mentoring courses that I conduct. – Noufal Ibrahim Jul 16 '14 at 08:15
Although this is a good answer(probably the best one), if you were to copy the sample provided by OP, then each line has a different number of spaces on them, meaning all the lines would be printed – Jul 16 '14 at 08:31
Yes. Which is why I posted the caveat about visual and logical lines. Maybe a preprocessing filter to clear out all trailing whitespaces would fix the problem but that would compromise the sheer simplicity of the answer. :) – Noufal Ibrahim Jul 16 '14 at 08:33
1

This depends on an assumption: there will never be two continuous lines exactly the same in a mail. – WKPlus Jul 16 '14 at 09:07
Well in such case you can firstly squeeze spaces: `tr -s ' ' < file | uniq`. – fedorqui Jul 16 '14 at 09:36
@fedorqui OK for trailing spaces, but it could also remove duplicates of lines containing similar text content (which should not be deleted). – Qeole Jul 17 '14 at 00:05

WKPlus · Answer 2 · 2014-07-16T08:39:11.343

Try this:

sed -r '/^>\s*$/{N;/^>\s*\n>\s*$/D}'

Here is the explanation:

Commands used:

N Append the next line of input into the pattern space.
D Delete up to the first embedded newline in the pattern space. Start next cycle, but skip reading from the input if there is still data in the pattern space.

Patterns used:

/^>\s*$/ matches a line contains '>' with zero or more spaces followed
/^>\s*\n>\s*$/ matches two continuous lines contains > with zero or more spaces followed when using together with N

So the above sed command's work flow is:

read a line into pattern space(if meets the end of file, exit)
if pattern space only contains '>' go to step 4 else go to step 3
print the context in pattern space and go to step 1
append '\n' and next line to pattern space, if the pattern space only contains '>\n>'(which means we meet two continuous '>' lines) go to step 5 else go to step 3
delete the context before '\n'(included) and then go to step 2

@Jidder Yes, use `/^>\s*$/` instead of `/^>$/` and `/^>\s*\n>\s*$/` instead of `/^>\n>$/`. — WKPlus, Jul 16 '14 at 08:38

score 2 · Answer 3 · answered Jul 16 '14 at 06:12

2

sed '/^>\s\s*$/d;$b;/^[^>]/b;a>'  input

Means:

/^>\s\s*$/d: Delete all lines with a single > and whitespace.

$b;/^[^>]/b: Print and skip the last line, an lines not starting with >.

a>: Add a > after all other lines.

Gives:

On 2014-07-11 at 03:36 PM, <ilovespaces@email.com> wrote:
>Hi Everyone,
>
>I love spaces.
>
>That's all.

answered Jul 16 '14 at 06:12

perreal

94,503
21
155
181

Maybe use `\s\+` instead of `\s\s*`? Or is it only a GNU extension? – Qeole Jul 16 '14 at 23:54
@Qeole, yes `\+` is a GNU extension. – perreal Jul 17 '14 at 05:07

score 1 · Answer 4 · answered Dec 05 '19 at 06:59

Another awk-based solution:

awk '{ /^>\s*$/?b++:b=0; if (b<=1) print }' file

Breakdown:

/^>\s*$/?b++:b=0
    - ? :       the ternary operator
    - /^>\s*$/  matches a blank line starts with ">"
    - b         variable that counts consecutive blank lines (b++).
                however, if the current line is non-blank, b is reset to 0.


if (b<=1) print
    print if the current line is non-blank (b==0)
          or if there is only one blank line (b==1).

score 0 · Answer 5 · 2014-07-16T08:13:31.147

awk way

This actually takes into account the spaces unlike other answers(except perreals :)) It also doesnt just insert a > after every line with more than > on it (meaning that if there were multiple lines with text, blank lines would not be inserted between them.)

awk 'a=/^>[ ]*$/{x=$1}!a&&x{print x;x=0}!a' file

Explanation

a=/^>[ ]*$/                    Sets a to pattern. Pattern is begins with > and 
                               then has  only spaces till end

{x=$1}                        Sets x to $1.

!a&&x                         While it does not match a(the pattern) and x is 0

{print x;x=0}                 Print x(>) and set x to zero

!a                            If it is not a(the pattern) print the line

The way this work is it sets x to > when it finds a line containing only > and spaces.
Then Carries on until it finds a line that doesn't match, prints > and prints the line. This resets everytime it finds the pattern again

Hope this helps :)

Replying to emails: how to condense multiple "blank" (not really blank; lines consisting only of ">") lines into one?

5 Answers5

Here is the explanation:

Linked