I have raw data in the format of df
below that I'm needing to condense into a fewer row data.frame
. The data represent eventual GIS point data (columns for coordinates are left off here) so I'm hoping to avoid plotting any duplicate points. Each row represents one point for which either or both of agency1 and agency2 publish data. A simpler case with fewer column formats is answered here.
For cases where there are rows with identical agency3_id
, I'd like to condense those two rows into one row. AS an example, I want rows 3 and 4 in the original data.frame
(which both have agency3_id abcde
) to become a single row in the desired data.frame
below (its row 3). I'm open to any R method. I'm sure there's a better title question - any help appreciated.
library(tidyverse)
library(lubridate)
agency1_id <- as.double(c("1500", NA, "2007", NA, "4501", NA))
agency2_id <- c(NA, "zxc", NA, "xcv", NA, "bnm")
agency3_id <- c("qwert", "ertyu", "abcde", "abcde", NA, NA)
agency1_val <- c(0.21, 1.5, 0.23, NA, 4.3, NA)
agency2_val <- c(0.05, 4.4, NA, 6.3, NA, 2.0)
agency1_desc <- c("st", NA, "ko", NA, "ui", NA)
agency2_desc <- c(NA, "lo", NA, "vf", NA, "kl")
agency1_dtm <- ymd_hm(c("2019-05-30 04:30", NA, "2019-05-30 04:35",
NA, "2019-05-30 04:33", NA))
agency2_dtm <- ymd_hm(c(NA, "2019-05-30 04:20", NA, "2019-05-30
04:29", NA, "2019-05-30 04:31"))
df <- data.frame(agency1_id, agency2_id, agency3_id, agency1_val,
agency2_val, agency1_desc, agency2_desc, agency1_dtm,
agency2_dtm)
as_tibble(df)
# agency1_id agency2_id agency3_id agency1_val agency2_val agency1_desc agency2_desc agency1_dtm agency2_dtm
# <dbl> <fct> <fct> <dbl> <dbl> <fct> <fct> <dttm> <dttm>
# 1 1500 NA qwert 0.21 0.05 st NA 2019-05-30 04:30:00 NA
# 2 NA zxc ertyu 1.5 4.4 NA lo NA 2019-05-30 04:20:00
# 3 2007 NA abcde 0.23 NA ko NA 2019-05-30 04:35:00 NA
# 4 NA xcv abcde NA 6.3 NA vf NA 2019-05-30 04:29:00
# 5 4501 NA NA 4.3 NA ui NA 2019-05-30 04:33:00 NA
# 6 NA bnm NA NA 2 NA kl NA 2019-05-30 04:31:00
desired df
:
# agency1_id agency2_id agency3_id agency1_val agency2_val agency1_desc agency2_desc agency1_dtm agency2_dtm
# <dbl> <fct> <fct> <dbl> <dbl> <fct> <fct> <dttm> <dttm>
# 1 1500 NA qwert 0.21 0.05 st NA 2019-05-30 04:30:00 NA
# 2 NA zxc ertyu 1.5 4.4 NA lo NA 2019-05-30 04:20:00
# 3 2007 xcv abcde 0.23 6.3 ko vf 2019-05-30 04:35:00 2019-05-30 04:29:00
# 4 4501 NA NA 4.3 NA ui NA 2019-05-30 04:33:00 NA
# 5 NA bnm NA NA 2 NA kl NA 2019-05-30 04:31:00