Hopefully this can be done with python! I used two clustering programs on the same data and now have a cluster file from both. I reformatted the files so that they look like this:
Cluster 0:
Brucellaceae(10)
Brucella(10)
abortus(1)
canis(1)
ceti(1)
inopinata(1)
melitensis(1)
microti(1)
neotomae(1)
ovis(1)
pinnipedialis(1)
suis(1)
Cluster 1:
Streptomycetaceae(28)
Streptomyces(28)
achromogenes(1)
albaduncus(1)
anthocyanicus(1)
etc.
These files contain bacterial species info. So I have the cluster number (Cluster 0), then right below it 'family' (Brucellaceae) and the number of bacteria in that family (10). Under that is the genera found in that family (name followed by number, Brucella(10)) and finally the species in each genera (abortus(1), etc.).
My question: I have 2 files formatted in this way and want to write a program that will look for differences between the two. The only problem is that the two programs cluster in different ways, so two cluster may be the same, even if the actual "Cluster Number" is different (so the contents of Cluster 1 in one file might match Cluster 43 in the other file, the only different being the actual cluster number). So I need something to ignore the cluster number and focus on the cluster contents.
Is there any way I could compare these 2 files to examine the differences? Is it even possible? Any ideas would be greatly appreciated!