r: combine two separate values on same row based on duplicate values in a column

Question

Let's say I have

# A tibble: 4 × 3
  Gene.names  Case Control
  <chr>      <dbl>   <dbl>
1 A1BG          52      NA
2 A1BG          NA      32
3 A2M           16      NA
4 A2M           NA      15

As you can see, Gene.names are duplicates and have corresponding values for Case and Control. I need to combine the values for Case and Control so they are printed on the same row for each Gene.name.

I am looking for a solution in dplyr.

Expected output

  Gene.names  Case Control
  <chr>      <dbl>   <dbl>
1 A1BG          52      32
2 A2M           16      15

Data

df <- structure(list(Gene.names = c("A1BG", "A1BG", "A2M", "A2M"), 
               Case = c(52, NA, 16, NA), Control = c(NA, 32, NA, 15)), row.names = c(NA, 
                                                                                     -4L), class = c("tbl_df", "tbl", "data.frame"))

Are duplicates guaranteed to only have one non `NA` value per column? If not how do you select which value to choose? — SamR, Jan 02 '23 at 07:01
Yes, `group_by(Gene.names) %>% fill(everything(), .direction = "downup") %>% slice(1)` was a solution. — cmirian, Jan 02 '23 at 07:10

score 1 · Answer 1 · answered Jan 02 '23 at 07:10

A combination of pivot_longer and picot_wider will do this.

library(tidyverse)

df <- structure(list(
  Gene.names = c("A1BG", "A1BG", "A2M", "A2M"),
  Case = c(52, NA, 16, NA), Control = c(NA, 32, NA, 15)
), row.names = c(
  NA,
  -4L
), class = c("tbl_df", "tbl", "data.frame"))



df |> 
  pivot_longer(cols = Case:Control) |> 
  filter(!is.na(value)) |> 
  pivot_wider(names_from = name, values_from = value)
#> # A tibble: 2 × 3
#>   Gene.names  Case Control
#>   <chr>      <dbl>   <dbl>
#> 1 A1BG          52      32
#> 2 A2M           16      15

r: combine two separate values on same row based on duplicate values in a column

1 Answers1