pivot_wider() produces more than one row for each observation. Problem with duplication?

Question

I cannot get my data frame to pivot_wider into one row per observation set. The data frame has paired observations (Copy; Forward Template and Reverse Template , with values in prim) nested within Accession. I'm trying to pivot_wider, leaving a row per paired observation In the final data frame, I need to be able to trace copy to Accession when summarising.

Moving forward I would also like to include three observations per copy (not included in this example).

The code below produces a separate column for each value of prim.

library(tibble)
library(dplyr)
library(tidyr)

df <- 
structure(list(prim = c("Template        601881  ....................  601900", 
"Template        601973  ......................  601952", "Template        331595  ....................  331614", 
"Template        331687  ......................  331666", "Template        196557  ....................  196576", 
"Template        196649  ......................  196628", "Template        153933  ....................  153952", 
"Template        154025  ......................  154004", "Template        2100939  ....................  2100920", 
"Template        2100847  ......................  2100868", "Template        11970  ....................  11989", 
"Template        12062  ......................  12041", "Template        2030677  ....................  2030658", 
"Template        2030585  ......................  2030606", "Template        1988028  ....................  1988009", 
"Template        1987936  ......................  1987957", "Template        1850917  ....................  1850898", 
"Template        1850825  ......................  1850846", "Template        1580702  ....................  1580683", 
"Template        1580610  ......................  1580631"), 
    Accession = c("CP042983.1", "CP042983.1", "CP042983.1", "CP042983.1", 
    "CP042983.1", "CP042983.1", "CP042983.1", "CP042983.1", "CP042983.1", 
    "CP042983.1", "CP043001.1", "CP043001.1", "CP043001.1", "CP043001.1", 
    "CP043001.1", "CP043001.1", "CP043001.1", "CP043001.1", "CP043001.1", 
    "CP043001.1"), Genus = c("Histophilus", "Histophilus", "Histophilus", 
    "Histophilus", "Histophilus", "Histophilus", "Histophilus", 
    "Histophilus", "Histophilus", "Histophilus", "Histophilus", 
    "Histophilus", "Histophilus", "Histophilus", "Histophilus", 
    "Histophilus", "Histophilus", "Histophilus", "Histophilus", 
    "Histophilus"), Species = c("somni", "somni", "somni", "somni", 
    "somni", "somni", "somni", "somni", "somni", "somni", "somni", 
    "somni", "somni", "somni", "somni", "somni", "somni", "somni", 
    "somni", "somni"), Metric = c("Forward Template", "Reverse Template", 
    "Forward Template", "Reverse Template", "Forward Template", 
    "Reverse Template", "Forward Template", "Reverse Template", 
    "Forward Template", "Reverse Template", "Forward Template", 
    "Reverse Template", "Forward Template", "Reverse Template", 
    "Forward Template", "Reverse Template", "Forward Template", 
    "Reverse Template", "Forward Template", "Reverse Template"
    )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-20L))

tmp<- 
    df %>% 
    mutate(CopyID = rep(1:1000, each = 2)) %>% 
    group_by(Accession) %>% 
    mutate(AccessID = group_indices()) %>% 
    pivot_wider(names_from = Metric, values_from = prim)

tmp

How do I create a data frame where both the Reverse Template and Forward Template columns are populated, and only one row per observation is created?

Cheers

score 5 · Accepted Answer · answered Jun 23 '20 at 07:33

5

Group the data by Metric instead :

library(dplyr)

df %>%
  group_by(Metric) %>%
  mutate(row = row_number()) %>%
  tidyr::pivot_wider(names_from = Metric, values_from = prim) %>%
  select(-row)

# A tibble: 5 x 5
#  Accession  Genus   Species `Forward Template`    `Reverse Template`   
#  <chr>      <chr>   <chr>   <chr>                 <chr>                
#1 CP042983.1 Histop… somni   601881  ............… 601973  ............…
#2 CP042983.1 Histop… somni   331595  ............… 331687  ............…
#3 CP042983.1 Histop… somni   196557  ............… 196649  ............…
#4 CP042983.1 Histop… somni   153933  ............… 154025  ............…
#5 CP042983.1 Histop… somni   2100939  ...........… 2100847  ...........…

answered Jun 23 '20 at 07:33

Ronak Shah

377,200
20
156
213

Thank you for your answer. On reading, I realised I did not include enough information about the task. I have updated my original entry above. – T. Williams Jun 23 '20 at 08:49
@T.Williams Did you try my answer. It still works. If it is not what you need, can you update your post with your expected output? – Ronak Shah Jun 23 '20 at 08:55
This works. Apologies. I got lost in what I was trying to achieve. Thanks for your help. – T. Williams Jun 23 '20 at 09:02

Peter · Answer 2 · 2020-06-23T11:05:37.550

Assuming that observations are grouped in pairs of rows this may be one way of achieving what you want...



library(tibble)
library(dplyr)
library(tidyr)


tmp <- 
  df %>% 
  mutate(AccessID = rep(1:(nrow(df)/2), each = 2)) %>%
  group_by(Accession) %>%
  pivot_wider(names_from = Metric, values_from = prim)
  
tmp
#> # A tibble: 10 x 6
#> # Groups:   Accession [2]
#>    Accession  Genus  Species AccessID `Forward Template`   `Reverse Template`   
#>    <chr>      <chr>  <chr>      <int> <chr>                <chr>                
#>  1 CP042983.1 Histo~ somni          1 Template        601~ Template        6019~
#>  2 CP042983.1 Histo~ somni          2 Template        331~ Template        3316~
#>  3 CP042983.1 Histo~ somni          3 Template        196~ Template        1966~
#>  4 CP042983.1 Histo~ somni          4 Template        153~ Template        1540~
#>  5 CP042983.1 Histo~ somni          5 Template        210~ Template        2100~
#>  6 CP043001.1 Histo~ somni          6 Template        119~ Template        1206~
#>  7 CP043001.1 Histo~ somni          7 Template        203~ Template        2030~
#>  8 CP043001.1 Histo~ somni          8 Template        198~ Template        1987~
#>  9 CP043001.1 Histo~ somni          9 Template        185~ Template        1850~
#> 10 CP043001.1 Histo~ somni         10 Template        158~ Template        1580~

^{Created on 2020-06-23 by the reprex package (v0.3.0)}

Thank you for your answer. This solve my problem in the sample data set; however, I realised I did not include enough information about the task. I have updated my original entry above. — T. Williams, Jun 23 '20 at 08:49
I've updated the answer retaining the `group_by(Accession)` if that is how you want to process the data. I'm not sure what you were trying to achieve with the `mutate` call to `group_indices()`. Although for my money the answer provided by @RonakShah seems simpler and less prone to potential error if your data are not entirely paired observations. — Peter, Jun 23 '20 at 11:09

pivot_wider() produces more than one row for each observation. Problem with duplication?

2 Answers2