Sed to remove more than 2 words in a sentence

Question

I am trying to get a sed command which will help me with the output which will display just the 2 words & not more than that.

echo  "test1:pass,test2:fail,test3:pass,test4:pass,test5:pass,test6:pass asfas"  | sed 's/,/<br>/g; s/:/  #  /g; s/\b\(.\)/\u\1/g'

Expected output :

Test1  #  Pass
Test2  #  Fail
Test3  #  Pass
Test4  #  Pass
Test5  #  Pass
Test6  #  Pass

I don't want the asfas to be present in the last Test6 line.

Also, I just want that the result should be either Pass or Fail, nothing else should come like PAss or PaSS Whatever is there in echo command either PaSS or PAss or FaIl or FAil, it should get replaced with either Pass or Fail only. Any word which is mentioned after the Pass or Fail should get removed and needs not to be shown.

Can someone tell me the more cleaner way to achieve the requirement from what I wrote ?

Thanks :)

potong · Answer 1 · 2021-08-28T07:45:42.427

1

This might work for you (GNU sed):

sed 's/.*/\L&/;s/\w\+/\u&/g;s/:/ # /g;y/,/\n/' file | 
sed 's/\w\+/&\n/2;P;d'

Two invocations of sed.

First invocation:

Lowercase everything.
Uppercase the first character of each word.
Format : to # .
Split line into lines on commas.

Second invocation:

Split line by a newline after the second word of the line.
Print first line of two lines only and delete the other.

N.B. The second invocation may be improved if blank and single word lines are not wanted:

sed -E 's/\w+/&\n/2;Ta;P;:a;d'

edited Aug 28 '21 at 07:45

answered Aug 28 '21 at 07:25

potong

55,640
6
51
83

Hi, I am now thinking that I do not want to change everything to lowercase. I want the "test" input to be given as it is...in the echo command...but the pass fail should only visible in "Pass" & "Fail" but test can be anything..as we are going to have different tests they can have any name... How can I achieve that? can you help me? – Sameer Atharkar Aug 28 '21 at 18:13
I am doing it with : `echo "fOO:pass,tesT2:fail,TEST:pass,fdfdhfd:pass,test5: anyresult,test6:pass asfas " | sed 's/^:.*/\L&/;s/\w\+/\u&/g;s/:/ # /g;y/,/\n/' | sed 's/\w\+/&\n/2;P;d'` is it right method? – Sameer Atharkar Aug 28 '21 at 18:15

danadam · Answer 2 · 2021-08-27T17:31:00.983

0

With more complex input (notice that the unwanted text in test3 contains comma):

test1:PAss,test2:FAil,test3:pass foobar, barfoo,test4:pass,test42:pass,test6:pass asfas

I would do it with 3 invocations of sed and 1 cut. First invocation splits it into lines, second one makes necessary changes and the last one joins lines back with  :

echo  "test1:PAss,test2:FAil,test3:pass foobar, barfoo,test4:pass,test42:pass,test6:pass asfas" |
    sed -e 's/,/\n/g' |
    sed -e '/^test[0-9]/ ! d' \
        -e 's/pass/Pass/i' \
        -e 's/fail/Fail/i' \
        -e 's/:/ # /' |
    cut -d' ' -f 1-3 |
    sed ':a; N; $!ba; s/\n/<br>/g'

Or if it is required to use only sed:

echo  "test1:PAss,test2:FAil,test3:pass foobar, barfoo,test4:pass,test42:pass,test6:pass asfas" |
    sed -e 's/,/\n/g' |
    sed -e '/^test[0-9]/ ! d' \
        -e 's/pass/Pass/i' \
        -e 's/fail/Fail/i' \
        -e 's/:/ # /' \
        -e 's/\([[:alnum:]]* # [[:alnum:]]*\).*/\1/' |
    sed ':a; N; $!ba; s/\n/<br>/g'

Output in both cases:

test1 # Pass<br>test2 # Fail<br>test3 # Pass<br>test4 # Pass<br>test42 # Pass<br>test6 # Pass

and without code formatting:

test1 # Pass
test2 # Fail
test3 # Pass
test4 # Pass
test42 # Pass
test6 # Pass

/^test[0-9]/ ! d removes lines that don't start with test[0-9].
s/pass/Pass/i is case insensitive so it matches any "pass" and replaces it with "Pass". Accordingly for "fail".
s/$[[:alnum:]]* # [[:alnum:]]*$.*/\1/ captures 2 words separated by # and replaces the whole line with this captured content.
:a; N; $!ba; s/\n/ /g is taken from https://www.baeldung.com/linux/join-multiple-lines#sed. It defines label a, appends lines to pattern space and lastly replaces \n with  .

edited Aug 27 '21 at 17:31

answered Aug 27 '21 at 17:04

danadam

3,350
20
18

And can you tell me why exactly you are preferring cut in first case? – Sameer Atharkar Aug 27 '21 at 19:39
Hi. Thanks a lot. This is really very helpful.. Actually that `test` can be anything.. we just need any single word name their. It can be anything which is not followed by numbers.. I just want to make sure that its first letter is capital. How can I achieve that? So basically the string should give output of Test # pass Test # Fail and nothing else. . – Sameer Atharkar Aug 27 '21 at 19:46
I like `cut` because it is shorter and more readable to me. – danadam Aug 27 '21 at 19:50
To capitalize the test name you can add another expression to that middle `sed` invocation: `-e 's/^$.$/\u\1/'`. It requires GNU sed though, because of that `\u` (see [another question](https://stackoverflow.com/questions/1538676/uppercasing-first-letter-of-words-using-sed)) – danadam Aug 27 '21 at 19:55
actually I tried to add the same command in a script like this : `echo -e $message | sed -e 's/,/\n/g' | sed -e 's/pass/Pass/i; s/fail/Fail/i; s/:/ # /; s/$[[:alnum:]]* # [[:alnum:]]*$.*/\1/' | sed ':a; N; $!ba; s/\n/
/g'` and it is not working as expected. I tried to add it in the format which you gave then also it was not working as expected it was not displaying the pass or fail string..just the tests . Then I tried to convert it to a single line but looks like it is not working in that way inside a script. Am I doing something wrong? – Sameer Atharkar Aug 27 '21 at 20:05
You replace `:` with `#` surrounded by 2 spaces but in the later pattern that matches 2 words separated by `#` you use only 1 space. – danadam Aug 27 '21 at 20:50

score 0 · Answer 3 · answered Aug 27 '21 at 17:21

0

The following is shell command:

$ echo "test1:pass,test2:fail,test3:pass,test4:pass,test5:pass,test6:pass asfas" | sed '
   # replace test[0-9]:(pass or fail) by test[0-9] # (pass or fail).
   # match anything up until an optional comma after, to remove any text after
   # matched globally, so it repeats for each pattern
   s/\(test[0-9]\):\(pass\|fail\)[^,]*,\?/\1 # \2\n/g;
   # apply uppercase to first letters
   s/pass/Pass/gi; s/fail/Fail/gi;
   # The first pattern will add a trailing newline to pattern space
   # remove it
   s/\n*$//
'

would output:

test1 # Pass
test2 # Fail
test3 # Pass
test4 # Pass
test5 # Pass
test6 # Pass

You can learn regex with fun with regex crossowrds.

answered Aug 27 '21 at 17:21

KamilCuk

120,984
8
59
111

Hi, @KamilCuk I don't actually want to hardcode the word `test` I want it to be any name of the test..it is not necessary that it should contain a name or a number. Just the requirement is it should be a single word like Test1 or either Test. The only thing is, I want the first letter to be capital. Rest can be as it is given input.. and it should not be like : Test 2 or Test Test2 . Can you guide me on that where I just need a single word ? – Sameer Atharkar Aug 27 '21 at 20:17
https://regexcrossword.com/ , https://www.grymoire.com/Unix/Sed.html#uh-1 , [sed manual](https://www.gnu.org/software/sed/manual/sed.html#The-_0022s_0022-Command) `I want the first letter to be capital` Does it __have to__ be sed? It sounds easier with awk. Anyway, it could be easy with `\U` GNU sed _extension_ (or with `y` command and some hold/pattern space shuffling). – KamilCuk Aug 27 '21 at 21:07
I am unable to do it on my own @KamilCuk. Can you tell me how can I achive it in this one liner sed command which I can use in script? This is what I am using currently which is taking a compulsory input of test(0-9) I want the input to be anything on place of test just the first letter should be capital. and everything should remain as it is. `echo "foo:pass,test2:fail,test3:pass,word3:pass,test5:pass,test6:pass asfas" | sed 's/$test[0-9]$:$pass\|fail$[^,]*,\?/\1 # \2\n/g;s/pass/Pass/gi; s/fail/Fail/gi;s/\n*$//'` – Sameer Atharkar Aug 28 '21 at 12:13
So instead of `test` match a `[^:]*` or a `\w*`. `just the first letter should` For that remember stuff in backreference and apply `\u`, like `sed 's/$\w\+$/\u\1/'`, see the manual. – KamilCuk Aug 28 '21 at 12:15
Opes. I don't know why but it is getting very confusing for me. I tried to do it like this : `echo "test1:pass,test2:fail,test3:pass,test4:pass,test5:pass,test6:pass asfas" | sed 's/$\w\+$/\u\1/:$pass\|fail$[^,]*,\?/\1 # \2\n/g;s/pass/Pass/gi; s/fail/Fail/gi;s/\n*$//'` but it is failing and saying `sed: -e expression #1, char 17: unknown option to `s'` – Sameer Atharkar Aug 28 '21 at 12:20
`getting very confusing for me` See https://www.grymoire.com/Unix/Sed.html#toc_Sed_-_An_Introduction_and_Tutorial_by_Bruce_Barnett . Sed may be not simple. Consider using other tools and languages. Like Python. – KamilCuk Aug 28 '21 at 12:21
can you correct me where am I doing wrong here : `echo "test1:pass,test2:fail,test3:pass,test4:pass,test5:pass,test6:pass asfas" | sed 's/$\w\+$/\u\1/:$pass\|fail$[^,]*,\?/\1 # \2\n/g;s/pass/Pass/gi; s/fail/Fail/gi;s/\n*$//'` ? – Sameer Atharkar Aug 28 '21 at 12:30
`s` has 3 separators - it's `s///`. In your `s/$\w\+$/\u\1/:$pass\|fail$[^,]*,\?/\1 # \2\n/g;` there are 5 `/` - it's unknown option to `s`. – KamilCuk Aug 28 '21 at 12:38

Ed Morton · Answer 4 · 2021-08-29T14:19:10.827

0

Just use awk. Using any awk in any shell on every Unix box:

$ echo  "test1:pass,test2:fail,test3:pass,test4:pass,test5:pass,test6:pass asfas" |
awk -v RS=',' -F':' -v OFS=' # ' '
    {
        sub(/ .*/,"")
        for (i=1; i<=NF; i++) {
            $i = toupper(substr($i,1,1)) tolower(substr($i,2))
        }
        print
    }
'
Test1 # Pass
Test2 # Fail
Test3 # Pass
Test4 # Pass
Test5 # Pass
Test6 # Pass

edited Aug 29 '21 at 14:19

answered Aug 27 '21 at 22:30

Ed Morton

188,023
17
78
185

1

`fOO:paSS` should be converted into `fOO # Pass` – Walter A Aug 29 '21 at 14:18
@WalterA fixed the `paSS` case, thanks. `fOO` shouldn't remain as that though since the OP shows `test` becoming `Test` in their example so I camel-cased it too even though the OP doesn't say/show what should happen to the 2nd and subsequent characters of the first string as I'm just assuming they want it treated the same as the 2nd string in that regard. – Ed Morton Aug 29 '21 at 14:19

Walter A · Answer 5 · 2021-08-29T14:13:31.080

0

In your solution you should use \n, not   and invoke sed twice.
And a small change to remome the remainder of the line.

echo "fOO:paSS,tesT2:fail,TESt:pasS,fdfdhfd:pass,test5:anyresult test,test6:pass asfas"|
  sed -r 's/,/\n/g' | sed -r 's/(.*):(.)(\w*).*/\1 # \u\2\L\3<br>/g'

EDIT:

I first thought only a four-letter word would be parsed.I changed the solution, so it will keep the first word.
OP wants to use this for HTML. I would prefer <pre>...</pre> above parsing text, but I added a   at the end of each line.

edited Aug 29 '21 at 14:13

answered Aug 28 '21 at 19:54

Walter A

19,067
2
23
43

Actually I am sending the result somewhere where I need it in html format. and can you explain how is your sed command working with output? I tried it with this echo command. `echo "fOO:paSS,tesT2:fail,TESt:pasS,fdfdhfd:pass,test5:anyresult test,test6:pass asfas" | sed -r 's/,/\n/g' | sed -r 's/(.*):(.)(...).*/\1 # \u\2\L\3/g'` It is giving output as `fOO # Pass tesT2 # Fail TESt # Pass fdfdhfd # Pass test5 # Anyr test6 # Pass ` What I want is that the word "Anyresult" should not be half left and it should show the complete word. can you help me achieve that? – Sameer Atharkar Aug 29 '21 at 09:33
In Bash newlines are ` \n`, not `
`. When you want HTML breaks, add them atg the end of each line. – Walter A Aug 29 '21 at 14:15
but what about the anyresult getting converted to just `Anyr` ? – Sameer Atharkar Aug 29 '21 at 14:41
I had changed that too wih `\w*`, not `...` – Walter A Aug 29 '21 at 15:34
Actually, this is also working as expected: `echo "Foo:PaSS,fOO:paSS,tesT2:fail,TESt:pasS,fdfdhfd:pass,test5:anyresulttest,test6:pass asfas, foo7:fail,fooo9:fail " | sed -E 's/:/#/g;s/,/ /g;s/(.\S*)(#)(.)(\S*)\s/\1 \2 \U\3\L\4\n
/g;' | awk 'NF < 2 || NF > 3 { $1=""}1' | sed 's/\s//'` but is this the right method? – Sameer Atharkar Aug 29 '21 at 16:07
No, do not mix `awk` and `sed`. The `awk` solution of @EdMorton is a compact and clear solution. I only continued with `sed` since you started with it, but my solution is (too) hard to read. – Walter A Aug 29 '21 at 17:38
Okay. I tested few cases with the edited answer. It is working as expected. Just one question.. can we convert it to the columns ? so that it will look much cleaner. – Sameer Atharkar Aug 29 '21 at 17:57
Last answer (getting off-topic): You could use something like `sed -r 's/(.*):(.)(\w*).*/\1\u\2\L\3/g'` and wrap it in `..
` – Walter A Aug 29 '21 at 18:03

Sed to remove more than 2 words in a sentence

5 Answers5