0

File1:

<a>hello</b> <c>foo</d>
<a>world</b> <c>bar</d>

Is an example of the file this would work on. How can one remove all strings which have a <c>*</d> using sed?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user191960
  • 1,941
  • 5
  • 20
  • 24
  • What do you mean by "remove all strings"? Do you mean remove that whole line or just that block of text? – Adam Batkin Oct 20 '09 at 07:02
  • All strings beginning with and ending with . The command below worked perfectly. Anyone using the command also, obviously, needs to add the file at the end of the command. – user191960 Oct 20 '09 at 07:07
  • Note that parsing XML-like strings with regex may cause issues: https://stackoverflow.com/a/1732454/384617 – David Pärsson Jul 08 '20 at 10:16

3 Answers3

4

The following line will remove all text from <c> to </d> inclusive:

sed -e 's/<c>.*<\/d>//'

The bit inside the s/...// is a regular expression, not really a wildcard in the same way as the shell uses, so anything you can put in a regular expression you can put in there.

Adam Batkin
  • 51,711
  • 9
  • 123
  • 115
  • Works perfectly! Remember to users of this command to add input/output file at end to redirect sed: sed -e 's/.*<\/d>//' In > Out. – user191960 Oct 20 '09 at 07:12
0

Great Swiss-Army knife!

I modified it to pull header info out of eMails for an archiving script. It involved renaming the IMAP eMails with both date and sender info (otherwise IMAP just numbered 1, 2, 3, etc.). Here's the two mods:

for i in $mailarray; do date -d $(less -f $i | grep -im 1 "Date:\ " | sed -e 's_^.*\(ate: \)__') +%F_%T%Z; done

for i in $mailarray; do less -f "$i" | grep -iEm 1 "From:\ " | sed -e 's_^.*\(rom\).*<\|^.*\(rom:\).__' | sed -e 's_@.*$__'; done

They saved a great deal of extraneous coding. Thank you.

Community
  • 1
  • 1
TCMcG
  • 1
0

if all your data is like that of the example

# gawk 'BEGIN{FS=" <c>"}{print $1}' file
<a>hello</b>
<a>world</b>
ghostdog74
  • 327,991
  • 56
  • 259
  • 343