4

I have a data that looks like this:

AB208804_1 446 576 AB208804_1orf 0
AB208804_20 446 576 AB208804_20orf 0

I want to convert them into this:

AB208804 446 576 AB208804orf 0
AB208804 446 576 AB208804orf 0

just by removing _\digit part in column 1 and 4.

Why this line doesn't work:

sed 's/_\d+//g'

What's the correct way to do it (one-liner)?

Willi Mentzel
  • 27,862
  • 20
  • 113
  • 121
neversaint
  • 60,904
  • 137
  • 310
  • 477
  • 1
    I have no idea why this doesn't work, but if you replace `\d` with `[0-9]` it works fine. – jtbandes Aug 06 '10 at 05:08
  • 5
    In GNU `sed`, `\d` introduces a decimal character code of one to three digits in the range 0-255. For example, to remove a tab you could do: `sed 's/\d9//'` (or `09` or `009`) or replace some unprintable characters with spaces: `sed 's/[\d1-\d31]/ /g'` – Dennis Williamson Aug 06 '10 at 06:07

3 Answers3

7

You need the -r switch and a character class for the sed.

$ echo "AB208804_1 446 576 AB208804_1orf 0" | sed -r 's/_[0-9]+//g'
AB208804 446 576 AB208804orf 0

Or, since you asked; in perl:

$ echo "AB208804_1 446 576 AB208804_1orf 0" | perl -ne 's/_\d+//g; print $_'
AB208804 446 576 AB208804orf 0
zen
  • 12,613
  • 4
  • 24
  • 16
2

Try:

sed 's/_[0-9]\+//g' 
codaddict
  • 445,704
  • 82
  • 492
  • 529
1
 sed 's/_[0-9][0-9]*//g' file
ghostdog74
  • 327,991
  • 56
  • 259
  • 343