I have got a dataset of genetic data for a bacterial strain collection and I want to plot a heat map showing the prevalence of a number of alleles for my strains (grouped by a grouping variable).
My raw data is a large data frame consisting of a number of strains (rows), a grouping variable (1 column) and multiple genetic determinants, each an own variable. I am struggling to create a heat map with ggplot since that requires a matrix of the data that needs to be plotted and I don't know how to transform my raw data into the required matrix. My original data frame looks like this (excerpt, for simplicity reasons):
Sample group A B C D E F
1 1 10 0 1 0 0 0 0
2 2 10 0 1 0 0 0 0
3 3 10 0 1 0 0 0 0
4 4 10 0 1 0 0 0 0
5 5 38 0 1 0 0 0 0
6 6 38 0 1 0 0 0 0
7 7 38 1 1 0 0 0 0
8 8 69 0 1 0 0 0 0
9 9 69 0 1 0 0 0 0
10 10 69 0 1 0 0 0 0
11 11 69 0 1 0 0 0 0
12 12 69 0 1 0 0 0 0
13 13 69 0 1 0 0 0 0
14 14 73 0 0 0 0 0 0
15 15 73 0 0 0 0 0 0
16 16 73 0 0 0 0 0 0
17 17 73 0 0 0 0 0 0
18 18 73 0 0 0 0 0 0
19 19 73 0 0 0 0 0 0
20 20 73 0 0 0 0 0 0
21 21 73 0 0 0 0 0 0
22 22 73 0 0 0 0 0 0
23 23 73 0 0 0 0 0 0
24 24 73 0 0 0 0 0 0
25 25 73 1 0 0 0 0 0
26 26 73 0 0 0 0 0 0
27 27 95 0 0 0 0 0 0
28 28 95 0 0 0 0 0 0
29 29 95 0 0 0 0 0 0
30 30 95 0 0 0 0 0 0
31 31 95 0 0 0 0 0 0
32 32 127 0 0 0 0 0 0
33 33 127 0 0 0 0 0 0
34 34 127 0 0 0 0 0 0
35 35 127 0 0 0 0 0 0
A-F are the allele variables and '0' means it is not present whereas '1' means it is. What I now want to do is to count the occurrence of '1' for each group and get a percentage in relation to all observations for that group (i.e. '1'/('1'+'0') for group 10, 38, 69, 73, 95, 127). Then this needs to be in a matrix like this:
group A B C D E F
1 10 0.000 1 0 0 0 0
2 38 0.333 1 0 0 0 0
3 69 0.000 1 0 0 0 0
4 73 0.077 0 0 0 0 0
5 95 0.000 0 0 0 0 0
6 127 0.000 0 0 0 0 0
My dataset is really huge so manually calculating and "typing" the matrix like in this example is not a feasible option. Is there any smart way to do this in R and then plot it as a heat map?
Any help is much appreciated. Thank you
kruemelprinz