1

I'm asking this question because even though there are many similar questions on this website (like this, this, and this), none of them are exactly my situation. Actually, this link is asking the same question as mine, but the answer there is unclear to me and raises the question that I am about to ask.

I have a dataset from which I am constructing a stacked barplot, and I wan't to know how I can arrange the stacked barplot where "similar" individuals cluster together. I work in bioinformatics, and here is the dataset which is a d-by-n matrix. In this toy dataset, there are d=10 ancestral populations and n = 5 individuals:

 > a
            V1          V2          V3           V4           V5
1  0.534410243 0.009358740 0.011295181 0.2141751740 0.0030129254
2  0.026653603 0.372426720 0.447847534 0.0179177507 0.4072904477
3  0.193317915 0.003605024 0.003186611 0.4832114736 0.0007095471
4  0.111881585 0.000000000 0.000000000 0.2296213741 0.0119233461
5  0.089696570 0.591163629 0.509774416 0.0032542030 0.5535847030
6  0.007543558 0.000000000 0.000000000 0.0364907757 0.0013148362
7  0.004862942 0.000000000 0.002123909 0.0146682272 0.0004053690
8  0.009276195 0.011710457 0.014367894 0.0000000000 0.0000000000
9  0.006903171 0.004314528 0.011404455 0.0000000000 0.0126889937
10 0.015454219 0.007420903 0.000000000 0.0006610215 0.0090698319

All columns add up to 1. I create a stacked barplot like so:

pop <- rownames(a)
a <- a %>% mutate(pop = rownames(a))
a_long <- gather(a, key, value, -pop)

# trying to create a similarity index
a_long <- a_long %>% group_by(key) %>% 
  mutate(mean = mean(value)) %>%
  arrange(desc(mean))

# looking at some of the expanded dataset
> a_long[1:20,]
# A tibble: 20 x 4
# Groups:   key [2]
   pop   key      value  mean
   <chr> <chr>    <dbl> <dbl>
 1 1     V2    0.00936    0.1
 2 2     V2    0.372      0.1
 3 3     V2    0.00361    0.1
 4 4     V2    0          0.1
 5 5     V2    0.591      0.1
 6 6     V2    0          0.1
 7 7     V2    0          0.1
 8 8     V2    0.0117     0.1
 9 9     V2    0.00431    0.1
10 10    V2    0.00742    0.1
11 1     V4    0.214      0.1
12 2     V4    0.0179     0.1
13 3     V4    0.483      0.1
14 4     V4    0.230      0.1
15 5     V4    0.00325    0.1
16 6     V4    0.0365     0.1
17 7     V4    0.0147     0.1
18 8     V4    0          0.1
19 9     V4    0          0.1
20 10    V4    0.000661   0.1

# colors
v_colors <- c("#440154FF", "#443B84FF", "#34618DFF", "#404588FF", "#1FA088FF", "#40BC72FF",
              "#67CC5CFF", "#A9DB33FF", "#DDE318FF", "#FDE725FF")

plot <- ggplot(a_long, aes(x = key, y = value, fill = pop)) 
plot + geom_bar(position="stack", stat="identity") +  scale_fill_manual(values = v_colors)

The output looks like this: enter image description here

How can I make the output look more neat, e.g. with the individuals with higher proportion of population 5 ancestry be next to each other on the x-axis? So far, I have tried to compute the "mean" of value of each individual, but it didn't work since it's not a good measure. How can I create a similarity index that tells me how similar individual 1 is to individual 2, and then how do I order it them on the x-axis so that they look well-clustered (e.g. like the barplots in this figure)?

Thanks!

One last thing: if you want to re-create the dataset a, here is the code:

v1 = c(0.534410243, 0.026653603, 0.193317915, 0.111881585, 0.089696570, 0.007543558, 0.004862942, 0.009276195, 0.006903171, 0.015454219)
v2 = c(0.009358740, 0.372426720, 0.003605024, 0.000000000, 0.591163629, 0.000000000, 0.000000000, 0.011710457, 0.004314528, 0.007420903)
v3 = c(0.011295181, 0.447847534, 0.003186611, 0.000000000, 0.509774416, 0.000000000, 0.002123909, 0.014367894, 0.011404455, 0.000000000) 
v4 = c(0.2141751740, 0.0179177507, 0.4832114736, 0.2296213741, 0.0032542030, 0.0364907757, 0.0146682272, 0.0000000000, 0.0000000000, 0.0006610215)
v5 = c(0.0030129254, 0.4072904477, 0.0007095471, 0.0119233461, 0.5535847030, 0.0013148362, 0.0004053690, 0.0000000000, 0.0126889937, 0.0090698319)
a = data.frame(V1 = v1, V2 = v2, V3 = v3, V4 = v4, V5 = v5)
mdy
  • 43
  • 5

0 Answers0