I have an annotation table from eggnog mapper and need to make this KO table:
Gene evalue KO
Gene1 0.00003 KO0000
Gene2 0.00005 KO0001
Gene2 0.00005 KO0003
Gene3 0.000005 KO0002
This is the table I have (test.txt):
Gene evalue KO
Gene1 0.00003 KO0000
Gene2 0.00005 KO0001,KO0003
Gene3 0.000005 KO0002
I have ~17,000 rows and the output is in xlsx format. The first issue I am having is that when I save the output file as a txt and view in linux (head test.txt
) some of the columns look like this:
Gene,evalue,KO
Gene1 0.00003 KO0000
Gene2 0.0005 "KO0001,KO0003"
Gene3 0.00005 KO0002
How can I remove the quotes around these values? And how can I make the annotation table above?
I have tried this script from this thread (How can I split comma separated values into multiple rows?)
awk '
BEGIN { OFS="\t" }
{ $1=$1;t=$0; }
{ while(index($0,",")) {
gsub(/,[[:alnum:],]*/,""); print;
$0=t; gsub(OFS "[[:alnum:]]*,",OFS); t=$0;
}
print t
}' file
But it seems to get stuck in an infinite loop because of the quotes around the values in the third column.
Thanks