R: Transform dataframe to show the corresponding values of a column for each unique row of another column

Question

I have a large dataframe with 2 columns: The one has a discrete number of values that appear repetitively, while the other only has unique values. Essentially multiple values in column 2 will correspond to one value in column 1.

As the data has currently been acquired, it lists each unique variable in column 2 as a row, which means there are repeated values in column 1.

I want to transform (essentially flip) the data so that I can see which column 2 values fall under each unique value in column 1.

For example, the df is:

Contig	Gene
C20	G1
C10	G2
C40	G3
C20	G4
C40	G5
C30	G6

And I want:

Contig	Gene
C10	G2
C20	G1, G4
C30	G6
C40	G3, G5

If I only get the number of unique values that will also be okay:

Contig	Gene(s)
C10	1
C20	2
C30	1
C40	2

I hope it makes sense. I've been struggling to find the right keywords to explain this issue and really don't know where to begin. Although I get the feeling I should maybe turn the data into a list.

iago · Accepted Answer · 2021-07-28T09:17:32.120

0

To get a dataframe like your first desired table, you may use

library(tidyr)
chop(df, Gene)

while to get the last one you may use:

library(dplyr)
df %>% group_by(Config) %>% summarise(n = n())

edited Jul 28 '21 at 09:17

answered Jul 28 '21 at 09:11

iago

2,990
4
21
27

score 0 · Answer 2 · answered Jul 28 '21 at 09:15

You can also use dplyr. First group by Contig and then summarise.

library(dplyr)

# Small test data
test_data <- data.frame(Contig = c("C10", "C10", "C20", "C20","C20"), 
                        Gene = c("G1", "G3", "G4", "G5", "G6"))


clean_data <- test_data %>% 
  group_by(Contig) %>% 
  summarise(number_of_genes = length(unique(Gene)), 
            specific_genes = paste0(unique(Gene), collapse = ", "))

score 0 · Answer 3 · answered Jul 28 '21 at 09:17

0

library(plyr)
M = data.frame(
  Contig = c("C20", "C10", "C40", "C20", "C40", "C30"),
  Gene = paste0("G",1:6)
)
f = function(m) {
  data.frame(
    Contig = unique(m$Contig),
    Gene = paste(m$Gene, collapse=",")
  )
}
ddply(M, .(Contig), f)

answered Jul 28 '21 at 09:17

Niels Holst

586
4
9

R: Transform dataframe to show the corresponding values of a column for each unique row of another column

3 Answers3