3

I have a tibble called df in the form:

sample  nuclide  intensity
SRM1    Pb206    200
SRM1    Pb207    250
SRM1    Pb208    301
SRM1    Pb206    202
SRM1    Pb207    254
SRM1    Pb208    305
SAM1    Pb206    154
SAM1    Pb207    262
SAM1    Pb208    311
SAM1    Pb206    157
SAM1    Pb207    261
SAM1    Pb208    325

It can be generated by:

df <- tbl_df(
data.frame(sample = rep(c("SRM1", "SAM1"), each = 6), 
nuclide = rep(c("Pb206", "Pb207", "Pb208"), 4), 
intensity = c(200, 250, 301, 202, 254, 305, 154, 262, 311, 157, 261, 325)))

I would like to rearrange it to have

sample  Pb208  Pb207  Pb206
SRM1    301    250    200
SRM1    305    254    202
SAM1    311    262    157
SAM1    325    261    204

I tried with the tidyr package using:

df %>%
  select(sample, nuclide, intensity) %>% 
  group_by(sample) %>%
  mutate(row = 1:n()) %>%
  spread(nuclide, intensity) %>% select(-row)

but it produced a different result with lots of undesired NAs.

It is very important that during the transformation the names of the samples in the dataset should retain their original order and that no aggregation function is used. Especially the first condition it is what makes different my problem from other similar previously posted.

The solution would then be applied to a much larger dataset with more than 20000 rows.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Ndr
  • 550
  • 3
  • 15
  • `library(data.table); dcast(setDT(df)[, id := 1:.N, by = .(sample, nuclide)], id + sample ~ nuclide, value.var = 'intensity')` – Jaap Feb 22 '17 at 12:05
  • Unfortunately, this solution doesn't retain the original order of the samples as defined in _df_. – Ndr Feb 22 '17 at 12:21
  • Just reorder it afterwards – Jaap Feb 22 '17 at 13:00
  • @Jaap found this in review queue. I would consider this not to be an exact duplicate because the difference here is that the combination of key & value columns may be duplicate and should be forced to appear in subsequent rows. – C8H10N4O2 Feb 22 '17 at 13:22
  • @Jaap also OP tagged dplyr -- I love data.table but for a beginner a dplyr solution is probably more helpful – C8H10N4O2 Feb 22 '17 at 13:23
  • I would also appreciate a bit of help and some lines of code to reorder the dataset as requested. – Ndr Feb 22 '17 at 13:27
  • anyway, a `tidyverse` solution: `library(tidyverse); df %>% group_by(sample, nuclide) %>% mutate(id = row_number()) %>% spread(nuclide, intensity)` – Jaap Feb 22 '17 at 13:28
  • @Jaap OP wants to maintain the order of column 1 as encountered so you would need to add a proxy variable for the order of column 1. If you re-open it, I'll answer it. – C8H10N4O2 Feb 22 '17 at 13:32
  • @C8H10N4O2 reopened & answered – Jaap Feb 22 '17 at 13:36

1 Answers1

2

Using:

lvls <- as.character(unique(df$sample))

library(tidyverse) # this will load 'dplyr' and 'tidyr' among others
df %>% 
  group_by(sample, nuclide) %>% 
  mutate(id = row_number()) %>% 
  spread(nuclide, intensity) %>% 
  ungroup() %>%                             # needed to be able to modify the 'sample' variable
  mutate(sample = factor(sample, levels = lvls)) %>% 
  arrange(sample) %>% 
  select(sample, Pb208:Pb206)

gives:

  sample Pb208 Pb207 Pb206
  (fctr) (dbl) (dbl) (dbl)
1   SRM1   301   250   200
2   SRM1   305   254   202
3   SAM1   311   262   154
4   SAM1   325   261   157

or alternatively (if you only just want a descending order):

df %>% 
  group_by(sample, nuclide) %>% 
  mutate(id = row_number()) %>% 
  spread(nuclide, intensity) %>% 
  arrange(desc(sample)) %>% 
  select(sample, Pb208:Pb206)

An alternative using data.table:

library(data.table)
dcast(setDT(df), sample + rowid(sample, nuclide) ~ nuclide, 
      value.var = 'intensity')[, sample := factor(sample, levels = lvls)
                               ][order(sample)]
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • This one works but, as I have to apply the same code on a relatively long dataset with many more sample ids, I would like to not specify the order of the levels by hand. – Ndr Feb 22 '17 at 13:47