Pivot wider based on group_by

Question

I have a dataframe that I want to pivot_wide based on matching values in record_id and LOC.So record_id 1 has Loc = LUC for both rows. So I want to merge information from both rows such that only 1 row exists for each record_id and Loc combination

df <- read.table(text = 
"  record_id   found   order   delivered   Loc coord   A   B   C   D   E   F   G
1   1   1/1/18  1/2/18  LUC A:123   A   A1  TR1 1   1   A   BC1
1   1   1/1/18  1/3/18  LUC B:124   B   A2  TR2 1   1   B   BC2
2   1   1/1/18  1/4/18  AMZ C:125   C   A3  TR3 2   2   C   BC3
3   NA  1/1/18  1/5/18  FR  A:126   A   A4  TR4 3   3   A   BC4
4   NA  1/1/18  1/6/18  FR  B:127   B   A5  TR5 4   4   B   BC5
5   1   1/1/18  1/7/18  LUC A:128   A   A6  TR6 5   5   A   BC6
5   1   1/1/18  1/8/18  FR  A:129   A   A7  TR7 5   5   A   BC7", header = TRUE)

Desired output

record_id   found   order   delivered   Loc coord   A   B   C   D   E   F   G   coord_2 A_2 B_2 C_2 D_2 E_2 F_2 G_2
1   1   1/1/18  1/2/18  LUC A:123   A   A1  TR1 1   1   A   BC1 B:124   B   A2  TR2 1   1   B   BC2
2   1   1/1/18  1/4/18  AMZ C:125   C   A3  TR3 2   2   C   BC3 NA  NA  NA  NA  NA  NA  NA  NA
3   NA  1/1/18  1/5/18  FR  A:126   A   A4  TR4 3   3   A   BC4 NA  NA  NA  NA  NA  NA  NA  NA
4   NA  1/1/18  1/6/18  FR  B:127   B   A5  TR5 4   4   B   BC5 NA  NA  NA  NA  NA  NA  NA  NA
5   1   1/1/18  1/7/18  LUC A:128   A   A6  TR6 5   5   A   BC6 NA  NA  NA  NA  NA  NA  NA  NA
5   1   1/1/18  1/8/18  FR  A:129   A   A7  TR7 5   5   A   BC7 NA  NA  NA  NA  NA  NA  NA  NA

This is what I tried but this doesn't work

df %>%
  group_by(record_id, Loc) %>%
  pivot_wider(names_from = c(record_id, Loc),
              values_from = c(coord:G))

Variable *delivered* should be discarded? – Andre Wildberg Jul 25 '23 at 17:52 — Andre Wildberg, Jul 25 '23 at 17:52

Jon Spring · Answer 1 · 2023-07-25T20:10:44.147

df %>%
  mutate(group_index = row_number(), .by = c(record_id, Loc)) %>%
  pivot_wider(names_from = group_index, values_from = coord:G, names_vary = "slowest") %>%
  rename_with(~sub("_1", "", .x), ends_with("_1")) # H/T @andre-wildberg

result

# A tibble: 7 × 21
  record_id found order  delivered Loc   coord A     B     C         D     E F     G     coord_2 A_2   B_2   C_2     D_2   E_2 F_2   G_2  
      <int> <int> <chr>  <chr>     <chr> <chr> <chr> <chr> <chr> <int> <int> <chr> <chr> <chr>   <chr> <chr> <chr> <int> <int> <chr> <chr>
1         1     1 1/1/18 1/2/18    LUC   A:123 A     A1    TR1       1     1 A     BC1   NA      NA    NA    NA       NA    NA NA    NA   
2         1     1 1/1/18 1/3/18    LUC   NA    NA    NA    NA       NA    NA NA    NA    B:124   B     A2    TR2       1     1 B     BC2  
3         2     1 1/1/18 1/4/18    AMZ   C:125 C     A3    TR3       2     2 C     BC3   NA      NA    NA    NA       NA    NA NA    NA   
4         3    NA 1/1/18 1/5/18    FR    A:126 A     A4    TR4       3     3 A     BC4   NA      NA    NA    NA       NA    NA NA    NA   
5         4    NA 1/1/18 1/6/18    FR    B:127 B     A5    TR5       4     4 B     BC5   NA      NA    NA    NA       NA    NA NA    NA   
6         5     1 1/1/18 1/7/18    LUC   A:128 A     A6    TR6       5     5 A     BC6   NA      NA    NA    NA       NA    NA NA    NA   
7         5     1 1/1/18 1/8/18    FR    A:129 A     A7    TR7       5     5 A     BC7   NA      NA    NA    NA       NA    NA NA    NA

Thanks. But I run into this error Error in `mutate_cols()`: ! Problem with `mutate()` column `.by`. ℹ `.by = c(record_id, Loc)`. ℹ `.by` must be size 7142 or 1, not 14284. — akang, Jul 25 '23 at 18:28
Hmm, I'm not sure. Perhaps your data is grouped? - you could try `ungroup() %>%` before the mutate. Possibly related: https://stackoverflow.com/questions/70185381/error-problem-with-mutate-column-must-be-size-15-or-1-not-17192 — Jon Spring, Jul 25 '23 at 20:10

score 0 · Answer 2 · answered Jul 25 '23 at 17:33

Your problem involves transforming a long-form data frame into a wide-form one, merging rows with the same 'record_id' and 'Loc'. Here's a solution using the dplyr and tidyverse packages in R.

First, you need to create an additional grouping variable that defines unique combinations within each group. This can be achieved using the group_indices() function from dplyr. After that, you can use the pivot_wider() function from tidyverse to transform the data frame.

Here is the R code that does this:

library(dplyr)
library(tidyverse)

df <- df %>%
  group_by(record_id, Loc) %>%
  mutate(group_index = group_indices()) %>%
  ungroup()

df_wide <- df %>%
  pivot_wider(names_from = group_index, values_from = coord:G, 
              names_glue = "{.value}_{.name}")

The group_indices() function creates a unique index for each group, and the pivot_wider() function uses these indices to create new column names in the wide format data frame.

You may want to handle missing values or decide what to do when there are multiple values for the same combination. This can be done by specifying the values_fn and values_fill arguments in the pivot_wider() function. For example, to use the first non-NA value within each group and fill missing values with NA, you can use values_fn = list(coord:G = ~first(na.omit(.))) and values_fill = NA.

This solution assumes that you want to keep the first value in case of multiple entries for the same 'record_id' and 'Loc'. If you want to use a different strategy, you need to specify it in the values_fn argument.

I edit the question to include a reproducible data. You could test your code with it. — Darren Tsai, Jul 25 '23 at 17:42

Andre Wildberg · Answer 3 · 2023-07-25T19:06:45.637

Using unnest_wider. relocate is just to match the order given but is not really necessary. Correcting the column names with rename_with

library(dplyr)
library(tidyr)

df %>% 
  summarize(across(found:delivered, ~ .x[1]), 
            across(coord:G, list), .by = c(record_id, Loc)) %>% 
  unnest_wider(coord:G, names_sep="_") %>% 
  relocate(matches(".*_[2-9]|.*_[1-9][0-9]"), 
    .after = ends_with("_1")) %>% 
  rename_with(~ sub("_1", "", .x), ends_with("_1")) %>% 
  print(Inf)
# A tibble: 6 × 21
  record_id Loc   found order  delivered coord A     B     C         D 
      <int> <chr> <int> <chr>  <chr>     <chr> <chr> <chr> <chr> <int> 
1         1 LUC       1 1/1/18 1/2/18    A:123 A     A1    TR1       1 
2         2 AMZ       1 1/1/18 1/4/18    C:125 C     A3    TR3       2 
3         3 FR       NA 1/1/18 1/5/18    A:126 A     A4    TR4       3 
4         4 FR       NA 1/1/18 1/6/18    B:127 B     A5    TR5       4 
5         5 LUC       1 1/1/18 1/7/18    A:128 A     A6    TR6       5 
6         5 FR        1 1/1/18 1/8/18    A:129 A     A7    TR7       5 
      E F     G     coord_2 A_2   B_2   C_2     D_2   E_2 F_2   G_2  
  <int> <chr> <chr> <chr>   <chr> <chr> <chr> <int> <int> <chr> <chr>
1     1 A     BC1   B:124   B     A2    TR2       1     1 B     BC2  
2     2 C     BC3   NA      NA    NA    NA       NA    NA NA    NA   
3     3 A     BC4   NA      NA    NA    NA       NA    NA NA    NA   
4     4 B     BC5   NA      NA    NA    NA       NA    NA NA    NA   
5     5 A     BC6   NA      NA    NA    NA       NA    NA NA    NA   
6     5 A     BC7   NA      NA    NA    NA       NA    NA NA    NA

Pivot wider based on group_by

3 Answers3