-2

I have a BLAST out file in tab limited format. Like this

p=BAC58264.1    CP014046.1  100.00  435 0   0   1   435 804117  8045    862
p=BAC58264.1    CP014046.1  100.00  160 0   0   3   372 444601  4443 32
p=BAC58264.1    BA000031.2  100.00  435 0   0   1   435 805024  371  862 

I want to sort that like this based on the 3rd column

p=BAC58264.1    CP014046.1  100.00  435 0   0   1   435 804117  8045    862
p=BAC58264.1    BA000031.2  100.00  435 0   0   1   435 805024  371  862

I usually did this by this awk code "$4>=435">BLASTSORT

How to incorporate this awk code in a Perl program?

John1024
  • 109,961
  • 14
  • 137
  • 171
  • 3
    if you need perl program for other reasons, this can be done with perl itself.. or [system](http://stackoverflow.com/questions/3854651/how-can-i-store-the-result-of-a-system-command-in-a-perl-variable) is one way to call external program from perl – Sundeep Sep 17 '16 at 06:48

2 Answers2

2

You would be far better off doing this in Perl, rather than starting a whole new process just for some simple text processing

I would need to see the rest of your Perl code to be sure exactly what the code should look like, but if you're reading a file line by line into a variable called, say, $line, then you could do this

my @fields = split ' ', $line;

print $line if $fields[3] >= 435;

If you want to show your existing Perl code then I will refine this

Borodin
  • 126,100
  • 9
  • 70
  • 144
  • According to the question, the data is tab-delimited, so I guess you want `split "\t", $line;`. Also, even though the question talks about the third column, the example script examines the fourth, and the sample data seems to confirm this observation. – tripleee Sep 17 '16 at 10:55
  • @tripleee: While "tab limited format" should be easy to code for, it is often used by people who mean "there are gaps between the fields". It is also too easy to put a tab-separated file through an editor and lose the tab characters altogether. In my experience it is far better to assume that data is "whitespace-separated" and so requires a bare `split`. I don't think I have ever encountered *real* tab-separated data that also had spaces within the data, but I am sure it happens. – Borodin Sep 17 '16 at 15:28
  • @tripleee: The best of all worlds would be to use Unicode `U+001F` or "INFORMATION SEPARATOR ONE", which is ASCII "Unit Separator" (US). But there is no precedent, and so no software or keyboard support. There is only a few of those first 32 control characters that are still of any real use; it's a real shame to waste a quarter of the 128 code points but, quite rightly, no one likes non-printable characters any more. – Borodin Sep 17 '16 at 15:29
1
use strict;
while (<DATA>){
    my @data = split /\t/,$_;
    print "@data\n" if $data[3]>=435;
}

__DATA__
p=BAC58264.1    CP014046.1  100.00  435 0   0   1   435 804117  8045    862
p=BAC58264.1    CP014046.1  100.00  160 0   0   3   372 444601  4443    32
p=BAC58264.1    BA000031.2  100.00  435 0   0   1   435 805024  371 862
Zheng Jin
  • 11
  • 1
  • its working perfectly then I want to save this out put to one file (example blastsorted) – user3295716 Sep 17 '16 at 10:58
  • @user3295716: It's sounding like you're reading and filtering your file, and then opening the result for further processing. As I wrote in my own answer, we would need to see the rest of your Perl code to be sure, but it's likely that there's no need to write the intermediate file. Because of that I feel that we've helped you to do the wrong thing. Please show your original awk code in context so that we can advise you better. – Borodin Sep 17 '16 at 15:34
  • you're right i want to open result for further analysis. I want to extract extract only 2,7,8 columns from the out put example CP014046.1 1 435 BA000031.2 1 435 like this @Borodin – user3295716 Sep 19 '16 at 04:57