3

I have a large dataset that has duplicate times (rows) with data in both row columns that I would like to combine. The data looks like this:

date              P1   PT1  P2   PT2   P3   PT3

5/5/2011@11:40    NA   NA   NA   NA   9.4   10.1

5/5/2011@11:40    5.6  10.2  8.5 10.1  NA   NA

I would like to get to this

date                P1     PT1     P2    PT2    P3    PT3

5/5/2011@11:40    5.6  10.2  8.5 10.1  9.4   10.1

My dataset is 10 minutes data for ten years and the repeats are somewhat random. The @ sign was added to display properly.

I've tried rbind and rbind.row.names to no avail.

Thanks!

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
schultz45
  • 31
  • 1
  • 1
    Welcome to stack overflow! Please provide a reproducible example to help us help you. Here's how to do that: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Ben G Aug 25 '21 at 14:30

3 Answers3

3

You can use the summarize() function in dplyr. The following will work, but it does not check for duplicates, it only takes the maximum value for each date.

library(dplyr)
df <- tribble(~date, ~P1, ~PT1, ~P2, ~PT2, ~P3, ~PT3, 
        "5/5/2011@11:40", NA, NA, NA, NA, 9.4, 10.1, 
        "5/5/2011@11:40", 5.6, 10.2, 8.5, 10.1, NA, NA
)

df %>%
    group_by(date) %>%
    summarize(across(starts_with("P"), max, na.rm = TRUE))

In other words, if you are sure that your data include either a number or NA, then it will work.

mikebader
  • 1,075
  • 3
  • 12
2

You could use tidyr's fill function.

library(dplyr)
library(tidyr)

df %>% 
    group_by(date) %>% 
    fill(-date, .direction = "updown") %>% 
    distinct() %>%
    ungroup()

to get

# A tibble: 1 x 7
  date              P1   PT1    P2   PT2    P3   PT3
  <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5/5/2011 11:40   5.6  10.2   8.5  10.1   9.4  10.1

As mikebader pointed out, there might be an issue with duplicates created while filling the data. To handle this, we could use a custom function:

check_fill <- function(df) {
  
  df <- df %>% 
    group_by(date) %>% 
    fill(-date, .direction = "updown") %>% 
    distinct() 
  
  if (df %>% 
      filter(n() > 1) %>% 
      nrow() > 0) {
    warning("Duplicates detected.")
  }
  
  df %>% 
    ungroup()
}

Used on

df2 <- tribble(~date, ~P1, ~PT1, ~P2, ~PT2, ~P3, ~PT3, 
              "5/5/2011 11:40", NA, NA, NA, NA, 9.4, 10.1, 
              "5/5/2011 11:40", 5.6, 10.2, 8.5, 10.1, 9.5, NA
)

this returns

check_fill(df2)
#> Warning in check_fill(df2): Duplicates detected.
#> # A tibble: 2 x 7
#>   date              P1   PT1    P2   PT2    P3   PT3
#>   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5/5/2011 11:40   5.6  10.2   8.5  10.1   9.4  10.1
#> 2 5/5/2011 11:40   5.6  10.2   8.5  10.1   9.5  10.1
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
1

A data.table option

> setDT(df)[, Map(na.omit, .SD), date]
             date  P1  PT1  P2  PT2  P3  PT3
1: 5/5/2011@11:40 5.6 10.2 8.5 10.1 9.4 10.1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81