I would like to collapse rows based on equality of first column. Then add the content of second column to the new collapsed table, comma-separated and with additional space. Also, if content of second column are the same, collapse them, that is, if 'non-virulent' appears two times in output file, show it just once.
I'm quite new here, please explain how to run it. Hope anyone can help me!
Input (tab-delimited):
HS372_01446 non-virulent
HS372_01446 non-virulent
HS372_01446 lung
HS372_00498 non-virulent
HS372_00498 non-virulent
HS372_00498 non-virulent
HS372_00498 lung
HS372_00498 lung
HS372_00954 jointlungCNS
HS372_00954 non-virulent
HS372_00954 non-virulent
HS372_00954 moderadamentevirulenta(nose)
HS372_00954 lung
Desired output (tab-delimited):
HS372_01446 non-virulent, lung
HS372_00498 non-virulent, lung
HS372_00954 jointlungCNS, non-virulent, moderadamentevirulenta(nose), lung