0

I would like to know how to extract all the numbers after the ID (KC000001-3), including the number set after a tap using Perl regex.

The additional number (0.50) for the first ID, (0.60) second ID, and (0.70 0.80) third ID is always starting with a space as a new line and ending up with another tap.

Input file.

KC000001    0.30    0.40    0.50
KC000002    0.30    0.40    0.50    0.60
KC152363    0.30    0.40    0.50    0.60    0.70    0.80

I would like to get this output file.

0.30    0.40    0.50
0.30    0.40    0.50    0.60
0.30    0.40    0.50    0.60    0.70    0.80

I have prepared this regex.

if ($linea =~ /^(.[a-z0-9]\d+.\d)\s(.?)$/){
    print $line 
}

However, it is giving me the following error (it is not printing the number after the tab (0.50 for the first), (0.60 for the second), and (0.70 0.80 for the third))

0.30    0.40
0.30    0.40    0.50
0.30    0.40    0.50    0.60

I would like to know what is wrong with this regex. Is it possible to make it with a regex only?

Input file.

KC000001    0.30    0.40    0.50
KC000002    0.30    0.40    0.50    0.60
KC152363    0.30    0.40    0.50    0.60    0.70    0.80

Output file

0.30    0.40
0.30    0.40    0.50
0.30    0.40    0.50    0.60
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223

3 Answers3

3

With a Perl one-liner:

$ perl -F"\t" -nE 'say join "\t", @F[1..$#F]' file | tee output_file
0.30    0.40    0.50
0.30    0.40    0.50    0.60
0.30    0.40    0.50    0.60    0.70    0.80

This is an array slice operation, to retain only column-2-till-the-end.

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
1

Use this Perl one-liner:

perl -pe 's{^KC\w+\t}{}' infile > outfile

or change the file in-place:

perl -i.bak -pe 's{^KC\w+\t}{}' infile

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.

See also:

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
1

This removes everything up to and including the first tab of each line:

$line =~ s/^[^\t]*\t//;
print $line;

As a one-liner:

perl -pe's/^[^\t]*\t//'

See Specifying file to process to Perl one-liner.

ikegami
  • 367,544
  • 15
  • 269
  • 518