Reorder a data frame with duplicated id

Question

I have a tibble called df in the form:

sample  nuclide  intensity
SRM1    Pb206    200
SRM1    Pb207    250
SRM1    Pb208    301
SRM1    Pb206    202
SRM1    Pb207    254
SRM1    Pb208    305
SAM1    Pb206    154
SAM1    Pb207    262
SAM1    Pb208    311
SAM1    Pb206    157
SAM1    Pb207    261
SAM1    Pb208    325

It can be generated by:

df <- tbl_df(
data.frame(sample = rep(c("SRM1", "SAM1"), each = 6), 
nuclide = rep(c("Pb206", "Pb207", "Pb208"), 4), 
intensity = c(200, 250, 301, 202, 254, 305, 154, 262, 311, 157, 261, 325)))

I would like to rearrange it to have

sample  Pb208  Pb207  Pb206
SRM1    301    250    200
SRM1    305    254    202
SAM1    311    262    157
SAM1    325    261    204

I tried with the tidyr package using:

df %>%
  select(sample, nuclide, intensity) %>% 
  group_by(sample) %>%
  mutate(row = 1:n()) %>%
  spread(nuclide, intensity) %>% select(-row)

but it produced a different result with lots of undesired NAs.

It is very important that during the transformation the names of the samples in the dataset should retain their original order and that no aggregation function is used. Especially the first condition it is what makes different my problem from other similar previously posted.

The solution would then be applied to a much larger dataset with more than 20000 rows.

`library(data.table); dcast(setDT(df)[, id := 1:.N, by = .(sample, nuclide)], id + sample ~ nuclide, value.var = 'intensity')` — Jaap, Feb 22 '17 at 12:05
Unfortunately, this solution doesn't retain the original order of the samples as defined in _df_. — Ndr, Feb 22 '17 at 12:21
@Jaap found this in review queue. I would consider this not to be an exact duplicate because the difference here is that the combination of key & value columns may be duplicate and should be forced to appear in subsequent rows. — C8H10N4O2, Feb 22 '17 at 13:22
@Jaap also OP tagged dplyr -- I love data.table but for a beginner a dplyr solution is probably more helpful — C8H10N4O2, Feb 22 '17 at 13:23
I would also appreciate a bit of help and some lines of code to reorder the dataset as requested. — Ndr, Feb 22 '17 at 13:27
anyway, a `tidyverse` solution: `library(tidyverse); df %>% group_by(sample, nuclide) %>% mutate(id = row_number()) %>% spread(nuclide, intensity)` — Jaap, Feb 22 '17 at 13:28
@Jaap OP wants to maintain the order of column 1 as encountered so you would need to add a proxy variable for the order of column 1. If you re-open it, I'll answer it. — C8H10N4O2, Feb 22 '17 at 13:32

Jaap · Accepted Answer · 2017-02-22T20:40:23.310

Using:

lvls <- as.character(unique(df$sample))

library(tidyverse) # this will load 'dplyr' and 'tidyr' among others
df %>% 
  group_by(sample, nuclide) %>% 
  mutate(id = row_number()) %>% 
  spread(nuclide, intensity) %>% 
  ungroup() %>%                             # needed to be able to modify the 'sample' variable
  mutate(sample = factor(sample, levels = lvls)) %>% 
  arrange(sample) %>% 
  select(sample, Pb208:Pb206)

gives:

  sample Pb208 Pb207 Pb206
  (fctr) (dbl) (dbl) (dbl)
1   SRM1   301   250   200
2   SRM1   305   254   202
3   SAM1   311   262   154
4   SAM1   325   261   157

or alternatively (if you only just want a descending order):

df %>% 
  group_by(sample, nuclide) %>% 
  mutate(id = row_number()) %>% 
  spread(nuclide, intensity) %>% 
  arrange(desc(sample)) %>% 
  select(sample, Pb208:Pb206)

An alternative using data.table:

library(data.table)
dcast(setDT(df), sample + rowid(sample, nuclide) ~ nuclide, 
      value.var = 'intensity')[, sample := factor(sample, levels = lvls)
                               ][order(sample)]

This one works but, as I have to apply the same code on a relatively long dataset with many more sample ids, I would like to not specify the order of the levels by hand. — Ndr, Feb 22 '17 at 13:47

Reorder a data frame with duplicated id

1 Answers1