0

Let's say I have

# A tibble: 4 × 3
  Gene.names  Case Control
  <chr>      <dbl>   <dbl>
1 A1BG          52      NA
2 A1BG          NA      32
3 A2M           16      NA
4 A2M           NA      15

As you can see, Gene.names are duplicates and have corresponding values for Case and Control. I need to combine the values for Case and Control so they are printed on the same row for each Gene.name.

I am looking for a solution in dplyr.

Expected output

  Gene.names  Case Control
  <chr>      <dbl>   <dbl>
1 A1BG          52      32
2 A2M           16      15

Data

df <- structure(list(Gene.names = c("A1BG", "A1BG", "A2M", "A2M"), 
               Case = c(52, NA, 16, NA), Control = c(NA, 32, NA, 15)), row.names = c(NA, 
                                                                                     -4L), class = c("tbl_df", "tbl", "data.frame"))
cmirian
  • 2,572
  • 3
  • 19
  • 59
  • Are duplicates guaranteed to only have one non `NA` value per column? If not how do you select which value to choose? – SamR Jan 02 '23 at 07:01
  • Yes, duplicates are guaranteed to only have one non `NA` – cmirian Jan 02 '23 at 07:02
  • 1
    Yes, `group_by(Gene.names) %>% fill(everything(), .direction = "downup") %>% slice(1)` was a solution. – cmirian Jan 02 '23 at 07:10

1 Answers1

1

A combination of pivot_longer and picot_wider will do this.

library(tidyverse)

df <- structure(list(
  Gene.names = c("A1BG", "A1BG", "A2M", "A2M"),
  Case = c(52, NA, 16, NA), Control = c(NA, 32, NA, 15)
), row.names = c(
  NA,
  -4L
), class = c("tbl_df", "tbl", "data.frame"))



df |> 
  pivot_longer(cols = Case:Control) |> 
  filter(!is.na(value)) |> 
  pivot_wider(names_from = name, values_from = value)
#> # A tibble: 2 × 3
#>   Gene.names  Case Control
#>   <chr>      <dbl>   <dbl>
#> 1 A1BG          52      32
#> 2 A2M           16      15
MarBlo
  • 4,195
  • 1
  • 13
  • 27