0

I have a data.table with a mix of dates, sfc, string, and numerical data that reflect the span of time the events lasted and I would like to add a column to reflect the weeks in which each event was present. While this worked previously when I broke down it down by having a row for every day for which the events occurred, it's not working now. I suspect it's due to missing values in the start/ end date columns.

Here's an example of my data:

#Required packages
require(data.table)
require(sf)

#Data
ID<-c(1:5)
lon<-c(-86.9655633, -85.8316318, -85.8316318, -82.8316318,-81.8316318)
lat<-c(34.5967589, 33.6598257, 33.6598257, 31.6598257, 32.6598257)
Country<-c("Malaysia", "Malayisia", "Singapore", "Thailand", "Thailand")
City<-c("Penang", "Malacca", "Singapore", "Bangkok", "Chiang Mai")
area_km2<-c(1048, 277, 728, 1569,  40)
start_date<-as.Date(c("2000-01-31", "2000-09-21", "2001-03-17", NA, NA))
end_date<-as.Date(c("2000-02-14", "2000-10-12", "2001-05-27", NA, NA))

samp<-data.table(ID, lon, lat, Country, City, area_km2, start_date, end_date)
samp <- st_as_sf(samp, coords = c("lon", "lat"), na.fail=F)

What I want is essentially to have one row for every week the event occurred in. E.g. if the event was from 3 Feb 2000 (Thur) to 8 Feb 2000 (Tue), I'd like to have two rows for the week starting 31 Jan, and another starting 7 Feb. Here is what I've tried so far which had worked previously.

samp  = samp[, list(
  `Week Starting` = if (!is.na(start_date) & !is.na(end_date) & start_date<=end_date)
  {seq.Date(start_date, end_date, by = 'week')}
  else {as.Date(NA)}), 
            by = list(ID, lon, lat, Country, City, area_km2)] 

It's a little complicated, but essentially I am attempting to build a panel for all cities and weeks, even if no events occurred in those times/ locations. Let me know how I can clairfy further. Thanks!

  • Tangent: you should almost always use `library`, not `require`. The latter never stops following code when the package is not available, which is almost never what is intended. Refs: https://stackoverflow.com/a/51263513. (If you want to stay with `require`, then *check its return status*.) – r2evans Dec 28 '20 at 13:23

1 Answers1

1

One suggestion is to create the weeks in an auxiliary table and merge it with the first.

Notes:

  1. I'm using tidyr::unnest since it is currently the best (and only) tool for that part of the pipe. I think the data.table devs are still considering/discussion if and how to deal with (un)nesting, so until then I'll use this. While it does produce a tbl instead of a data.table, we can still use it without problem.

  2. This is being done before sf::st_as_sf.

weeks <- tidyr::unnest(
  samp[!is.na(start_date), .(date = Map(seq.Date, start_date, end_date, by = "week")), by = .(ID) ],
  date
)
weeks
# # A tibble: 18 x 2
#       ID date      
#    <int> <date>    
#  1     1 2000-01-31
#  2     1 2000-02-07
#  3     1 2000-02-14
#  4     2 2000-09-21
#  5     2 2000-09-28
#  6     2 2000-10-05
#  7     2 2000-10-12
#  8     3 2001-03-17
#  9     3 2001-03-24
# 10     3 2001-03-31
# 11     3 2001-04-07
# 12     3 2001-04-14
# 13     3 2001-04-21
# 14     3 2001-04-28
# 15     3 2001-05-05
# 16     3 2001-05-12
# 17     3 2001-05-19
# 18     3 2001-05-26

And then simply left-join/merge them:

merge(samp, weeks, all.x = TRUE, by = "ID")
#     ID       lon      lat   Country       City area_km2 start_date   end_date       date
#  1:  1 -86.96556 34.59676  Malaysia     Penang     1048 2000-01-31 2000-02-14 2000-01-31
#  2:  1 -86.96556 34.59676  Malaysia     Penang     1048 2000-01-31 2000-02-14 2000-02-07
#  3:  1 -86.96556 34.59676  Malaysia     Penang     1048 2000-01-31 2000-02-14 2000-02-14
#  4:  2 -85.83163 33.65983 Malayisia    Malacca      277 2000-09-21 2000-10-12 2000-09-21
#  5:  2 -85.83163 33.65983 Malayisia    Malacca      277 2000-09-21 2000-10-12 2000-09-28
#  6:  2 -85.83163 33.65983 Malayisia    Malacca      277 2000-09-21 2000-10-12 2000-10-05
#  7:  2 -85.83163 33.65983 Malayisia    Malacca      277 2000-09-21 2000-10-12 2000-10-12
#  8:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-03-17
#  9:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-03-24
# 10:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-03-31
# 11:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-04-07
# 12:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-04-14
# 13:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-04-21
# 14:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-04-28
# 15:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-05-05
# 16:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-05-12
# 17:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-05-19
# 18:  3 -85.83163 33.65983 Singapore  Singapore      728 2001-03-17 2001-05-27 2001-05-26
# 19:  4 -82.83163 31.65983  Thailand    Bangkok     1569       <NA>       <NA>       <NA>
# 20:  5 -81.83163 32.65983  Thailand Chiang Mai       40       <NA>       <NA>       <NA>
r2evans
  • 141,215
  • 6
  • 77
  • 149