I am having some complications with matching strings to each other.
Say I have the following table:
broken
vector
unidentified
synthetic
artificial
And I have a second dataset that looks like this:
org1 Fish
org2 Amphibian
org3 vector
org4 synthetic species
org5 Mammal
I want to remove all the rows from the second table that match the string from the first table so that the output looks like this:
org1 Fish
org2 Amphibian
org5 Mammal
I was thinking of using grep -v
in bash, but I am not quite sure how to make it loop through all the strings in table 1.
I am trying to work it out in Perl, but for some reason it returns all my values instead of just the ones that match. Any idea why?
My script looks like this:
#!/bin/perl -w
($br_str, $dataset) = @ARGV;
open($fh, "<", $br_str) || die "Could not open file $br_str/n $!";
while (<$fh>) {
$str = $_;
push @strings, $str;
next;
}
open($fh2, "<", $dataset) || die "Could not open file $dataset $!/n";
while (<$fh2>) {
chomp;
@tmp = split /\t/, $_;
$groups = $tmp[1];
foreach $str(@strings){
if ($str ne $groups){
@working_lines = @tmp;
next;
}
}
print "@working_lines\n";
}