Transposing column with non-unique identifiers in R dataframe

Question

I have a dataset

df <- data.frame(sp = rep(c("A","B","C","D"), each = 10), val = rnorm(40,6,0.1))

sp  val
A   6
A   6.1
A   5.9
...
D   6.2

I want to "transpose" the rows such that the final dataframe looks like

   sp  1    2     3    4 .....10
   A   6   6.1     ...
   B
   C
   D

I can't use spread as it's not unique columns. transpose and pivot wide didn't work either.

score 5 · Accepted Answer · answered Mar 28 '23 at 19:55

We could do it this way:

library(tidyr)
library(dplyr)

df %>% 
  group_by(sp) %>% 
  mutate(row_id = row_number()) %>% 
  pivot_wider(names_from = row_id, values_from = val)

  sp      `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A      6.10  6.05  6.02  5.94  6.14  5.94  6.22  6.15  5.98  5.90
2 B      5.93  6.03  5.98  5.97  5.90  6.00  5.92  5.83  5.96  6.09
3 C      5.94  6.06  5.84  5.99  6.05  6.03  6.01  5.94  5.92  5.90
4 D      6.01  5.91  5.95  5.97  6.18  5.93  6.02  6.01  5.90  5.99

score 3 · Answer 2 · answered Mar 28 '23 at 21:02

Using base R

xtabs(val ~ sp + rn, transform(df, rn = ave(seq_along(sp), sp, FUN = seq_along)))

-output

 rn
sp         1        2        3        4        5        6        7        8        9       10
  A 6.037480 5.984203 5.940976 5.966072 5.790751 5.890902 5.911012 5.962350 5.917467 5.902163
  B 6.158129 6.040821 6.089519 5.974533 5.980293 5.923336 5.925839 6.039938 5.918039 6.079704
  C 6.053440 6.033050 6.087150 6.020446 5.886043 6.140898 5.916057 5.918075 5.993348 6.087708
  D 5.973308 5.820704 5.996212 6.006559 5.975797 5.943821 6.005226 5.849988 6.066520 5.948021

Or with data.table

library(data.table)
dcast(setDT(df), sp ~ rowid(sp))

-output

 sp        1        2        3        4        5        6        7        8        9       10
1:  A 6.037480 5.984203 5.940976 5.966072 5.790751 5.890902 5.911012 5.962350 5.917467 5.902163
2:  B 6.158129 6.040821 6.089519 5.974533 5.980293 5.923336 5.925839 6.039938 5.918039 6.079704
3:  C 6.053440 6.033050 6.087150 6.020446 5.886043 6.140898 5.916057 5.918075 5.993348 6.087708
4:  D 5.973308 5.820704 5.996212 6.006559 5.975797 5.943821 6.005226 5.849988 6.066520 5.948021

score 0 · Answer 3 · answered Mar 28 '23 at 20:07

You first want to number each row with a unique identifier by group and that use that unique identifier to pivot wider.

based off of this answer you can achieve the first step and then simply pivot wider

df <- data.frame(sp = rep(c("A","B","C","D"), each = 10), val = rnorm(40,6,0.1))

Label each row with a unique sequence number by group.

df_cumsum <- df %>% group_by(sp) %>% mutate(Snum = cumsum(duplicated(sp)))

Pivot the resulting dataframe.

df_wider <- df_cumsum %>% pivot_wider(id_cols ="sp",names_from = "Snum", values_from = "val")

No need for this weird `cumsum(duplicated)`. `row_number()` (see TarJae's answer) and `data.table::rowid` are much more straigthforward. Apart from that, this answer is presta much identical to TarJae's — GuedesBF, Mar 28 '23 at 20:15

DPH · Answer 4 · 2023-03-28T21:24:30.843

if all groups have the same length, like your example, the following can be an alternative approach using base R (with one exception for convinience):

# split df to list by group colum, use transpose and select second row 
# while converting the result to numeric
lapply(split(df, df$sp), function(x) as.numeric(t(x)[2, ])) 
# returns a named list of four list items, containing the to be data.frame rows
$A
 [1] 5.853568 6.030801 5.796363 5.921868 6.131841 5.948130 5.903753 6.059718 5.934419 6.098146

$B
 [1] 6.073412 5.982033 6.112947 6.077986 5.862820 5.980009 5.873417 5.939692 6.106947 5.894948

$C
 [1] 6.023042 5.982562 6.038372 5.866949 6.001430 6.089119 5.883650 5.923488 5.871431 5.992740

$D
 [1] 6.115511 5.911106 5.969717 5.919504 5.978090 6.003362 6.026468 6.066644 6.028981 6.082114

Here you can use many apporaches to get a data.frame result. One of such is to use sapply() instead of lapply() in combination with a second t() call as well as a as.data.frame() call but you will still have to work the row names to a new column:

as.data.frame(t(sapply(split(df, df$sp), function(x) as.numeric(t(x)[2, ]))))
    V1       V2       V3       V4       V5       V6       V7       V8       V9      V10
A 5.853568 6.030801 5.796363 5.921868 6.131841 5.948130 5.903753 6.059718 5.934419 6.098146
B 6.073412 5.982033 6.112947 6.077986 5.862820 5.980009 5.873417 5.939692 6.106947 5.894948
C 6.023042 5.982562 6.038372 5.866949 6.001430 6.089119 5.883650 5.923488 5.871431 5.992740
D 6.115511 5.911106 5.969717 5.919504 5.978090 6.003362 6.026468 6.066644 6.028981 6.082114

With the help of the plyr library (here we are out of base R territory) you just need one aditional function call to get a data.frame with an id column (drawn from list item names). the .id argument is not necessary (try to ommit it), it just defines the name for the new column originate from the list item names:

plyr::ldply(lapply(split(df, df$sp), function(x) as.numeric(t(x)[2, ])), .id = "sp")
  sp       V1       V2       V3       V4       V5       V6       V7       V8       V9      V10
1  A 5.853568 6.030801 5.796363 5.921868 6.131841 5.948130 5.903753 6.059718 5.934419 6.098146
2  B 6.073412 5.982033 6.112947 6.077986 5.862820 5.980009 5.873417 5.939692 6.106947 5.894948
3  C 6.023042 5.982562 6.038372 5.866949 6.001430 6.089119 5.883650 5.923488 5.871431 5.992740
4  D 6.115511 5.911106 5.969717 5.919504 5.978090 6.003362 6.026468 6.066644 6.028981 6.082114

considering the needed aditional steps and precondition of equal length (or work arrounds) to get an equal result I prefer and would recommend the tidyr::pivot_ option

Transposing column with non-unique identifiers in R dataframe

4 Answers4