0

I have a string, lets say:

<lic><ic>This is a string</ic>, welcome to my blog.</lic>

I want to use sed to get rid of the <ic> and </ic> tags, as well as the literal tags <lic> and </lic>

What is the fastest way to do this? I'm very new to sed. How would this be done in awk? I know awk is much better for column-like text, so I feel more inclined to learn how to use sed.

Any help is always appreciated, thanks in advance!

Prince John Wesley
  • 62,492
  • 12
  • 87
  • 94
tf.rz
  • 1,347
  • 6
  • 18
  • 47

4 Answers4

3

Remove only tags:

sed -i.old -r 's;</?l?ic>;;g' infile
Prince John Wesley
  • 62,492
  • 12
  • 87
  • 94
  • +1 for most concise answer. `sed 's|\?l\?ic>||g' infile` would work too or if you prefer `sed 's|*l*ic>||g'` at a pinch. – potong May 22 '12 at 06:45
3
sed -e 's%</\{0,1\}l\{0,1\}ic>%%g'

The \{0,1\} is the standard sed way of writing the equivalent of ? in PCRE. The regex uses % to separate bits; then looks for an < possibly followed by a slash, possibly followed by an l, followed by ic> and replaces it with nothing, globally across each line of input.

Some versions of sed allow you to specify alternative systems of regexes, but this works everywhere.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Thanks so much! This worked on the first try! Do you by chance know how this would be done using awk? – tf.rz May 22 '12 at 13:57
  • I'm sure it could be done with `awk`; I would not use `awk` for the job, though. I'd use Perl, where it would be 'trivial': `perl -pe 's%?l?ic>%%g'`. In situ overwirting of the files with backup is available too. With `gawk`, the function would be `gsub`: `awk '{gsub(/<\/?l?ic>/, '', $0); print;}'`. Untested code. – Jonathan Leffler May 22 '12 at 14:16
  • As you said, some versions of `sed` support additional regex features. At least in GNU `sed`, `\?` works (or with `-r`: `?`). – Dennis Williamson May 22 '12 at 14:20
2

sed doesn't need to be complicated. Here are two simple ways to do what you want.

This matches those exact patterns and removes them globally:

sed -e "s%\(<lic>\|</lic>\|<ic>\|</ic>\)%%g" file.txt

Remember, that you can set multiple expressions using sed if necessary:

sed -e "s%<lic>%%" -e "s%</lic>%%" -e "s%<ic>%%" -e "s%</ic>%%" file.txt

Steve
  • 51,466
  • 13
  • 89
  • 103
1

Your tags have a structure of a left bracket followed by a number of characters that are not a right bracket and then finally a right bracket. So let's write it that way:

sed 's/<[^>]*>//g'
Michael J. Barber
  • 24,518
  • 9
  • 68
  • 88