3

Aggregate to rows according to unique identifier?

I have a data frame that has a unique id (syllable) and a duration. In order to continue my analysis I need to aggregate the data frame by syllable.

This,

syllable   duration
ba         0.20414850
a          0.06804950
na         0.11525535
a          0.09877130
na         0.36774874
ba         0.18228837
ba         0.22232325

should look like this:

syllable   duration_1    duration_2  duration_3
ba         0.20414850    0.18228837  0.22232325
a          0.06804950    0.09877130
na         0.11525535    0.36774874

I tried to the group_by function of dplyr

library(dplyr)
df %>%
  group_by(syllable) %>%
  summarise(duration = paste(duration, collapse = ","))

However, this yields:

syllable   duration    
ba         c(0.20414850,0.18228837,0.22232325)
a          c(0.06804950,0.09877130)
na         c(0.11525535,0.36774874)

Thank you

Fino
  • 1,774
  • 11
  • 21
  • In this case, you need to give R a "time variable" so you can reshape your data to "wide" format. To achieve this, you have to tell R the number of each observation within each group. For example, "this is the first observation of group _ba_, this is the second one, and so on... Try this: `library(data.table) SO <- as.data.table(SO) SO[, Time_Var := seq(1:.N), by = "syllable"] SO <- reshape(data = SO, direction = "wide", idvar = "syllable", timevar = "Time_Var")` – Arturo Sbr Feb 15 '19 at 15:53

1 Answers1

1

What you're looking for is:

library(dplyr)

df %>%
  group_by(syllable) %>%
  mutate(dur = paste0("duration_", row_number())) %>%
  spread(dur, duration) %>% as.data.frame()

Output:

  syllable duration_1 duration_2 duration_3
1        a  0.0680495  0.0987713         NA
2       ba  0.2041485  0.1822884  0.2223233
3       na  0.1152554  0.3677487         NA

I've only added %>% as.data.frame() so that it prints all the decimals, otherwise this is not needed.

arg0naut91
  • 14,574
  • 2
  • 17
  • 38