I am pretty new to Perl codes, and I am merging some datasets with the following code. The data is set up as such: first row specifying the sample names, followed by the counts on the second, third columns.... The first column specifies the gene names. I've got 2 big datasets that I'm merging together, and I have been using the following Perl script, by specifying the path to the perl script, and running the following code in Terminal:
$ cd /path/to/file
$ perl /path/to/file dataset1.txt dataset2.txt merged.txt
The Perl script is as follows:
use strict;
my $file1=$ARGV[0];
my $file2=$ARGV[1];
my $out=$ARGV[2];
my %hash=();
open(RF,"$file1") or die $!;
while(my $line=<RF>){
chomp($line);
my @arr=split(/\t/,$line);
my $gene=shift(@arr);
$hash{$gene}=join("\t",@arr);
}
close(RF);
open(RF,"$file2") or die $!;
open(WF,">$out") or die $!;
while(my $line=<RF>){
chomp($line);
my @arr=split(/\t/,$line);
my $gene=shift(@arr);
if(exists $hash{$gene}){
print WF $gene . "\t" . $hash{$gene} . "\t" . join("\t",@arr) . "\n";
}
}
close(WF);
close(RF);
With the above code is I am supposed to get a merged table, with the duplicate rows deleted, and the second text file's (Sample A to Sample Z) columns merged to the first text file's columns (Sample 1 to Sample 100), so it should look like this, separated by tabs.
Gene Name Sample 1 Sample 2 ..... Sample A Sample B...
TP53 2.345 2.234 4.32 4.53
The problem arises when my merged files come back with the two datasets merged, however the second dataset in the next row instead of the same row. It will recognise, sort, and merge the counts, but onto the next row. Is there something wrong with my codes or my input?
Thank you for all of your help!!