How to extract all columns when using data.table?

Question

library(data.table)

I am trying to do this.

wd <- structure(list(Year = c(2006L, 2006L, 2006L), day = c(361L, 361L, 
360L), hour = c(14L, 8L, 8L), mint = c(30L, 0L, 30L), valu1 = c(0.5, 
0.3, 0.4), Date = structure(c(1167229800, 1167206400, 1167121800
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("Year", 
"day", "hour", "mint", "valu1", "Date"), row.names = c(NA, -3L
), class = "data.frame")

wg <- c("2006/12/27 14:23:59", "2006/12/27 16:47:59", "2006/12/27 19:12:00")
w <- c("0.4", "0.2", "0.5")
wf=data.frame(wg,w)
wg <- as.POSIXct(wf$wg, format = "%Y/%m/%d %T", tz = "UTC") 
WG <- data.table(start = wg, end = wg)
setkey(WG)
## Do the same for `wd` adding +/- 30 minutes 
setDT(wd)[, `:=`(start = Date - 1800L, end = Date + 1800L)]
## Run foverlaps and extract the match `valu1` column
foverlaps(wd, WG, nomatch = 0L)[, .(wdDate = Date, valu1, WGDate = start)]
                    wdDate valu1              WGDate
   1: 2006-12-27 14:30:00   0.5 2006-12-27 14:23:59

As you can see in the final results only valu1 was extracted from wd but I would like also to extract the corresponding values from w in wf.

So I want something like this:

                     wdDate valu1              WGDate      w
    1: 2006-12-27 14:30:00   0.5 2006-12-27 14:23:59      0.4

Any idea is welcome.

Real data:

  head(wf)
       date1  date2        date3n               wg         w    whyt
1     <NA> 2003-01-01      <NA>                <NA>        NA   NA
2     <NA> 2003-01-02      <NA>                <NA>        NA   NA
3     <NA> 2003-01-03      <NA> 2003/01/03 10:30:00 0.2137352 0.34
4     <NA> 2003-01-04      <NA>                <NA>        NA   NA

Facing a problem here:

score 5 · Accepted Answer · edited May 23 '17 at 11:52

5

In my previous answer I've created WG because you provided wg as a single vector. If you already have a data set called wf, this whole proccess is not needed. You just need to adjust wf correctly and then run foverlaps. In other words, forget about WG and do the following

setDT(wf)[, wg := as.POSIXct(wg, format = "%Y/%m/%d %T", tz = "UTC")]
wf[, `:=`(start = wg, end = wg)]
setkey(wf, start, end)
setDT(wd)[, `:=`(start = Date - 1800L, end = Date + 1800L)]

foverlaps(wd, wf, nomatch = 0L)[, .(wdDate = Date, valu1, WGDate = start, w)]
#                 wdDate valu1              WGDate   w
# 1: 2006-12-27 14:30:00   0.5 2006-12-27 14:23:59 0.4

edited May 23 '17 at 11:52

Community

1
1

answered May 11 '15 at 12:25

David Arenburg

91,361
17
137
196

I've added ````setDT(wd)[, `:=`(start = Date - 1800L, end = Date + 1800L)]```` back, I thought you already did that line. Everything up until (including) `wf=data.frame(wg,w)` is the same as in your post so I didn't repost it again. – David Arenburg May 11 '15 at 12:36
Which column has NAs? `wg`? – David Arenburg May 11 '15 at 12:46
1

`w` doesn't matter because it is not being joined on. Try removing `NA`s from `wf` using `wf <- na.omit(wf, by = "wg")` and then run `foverlaps` – David Arenburg May 11 '15 at 12:53
The code doesn't search for *exactly* +-30, it takes everything within that range exactly like you need. There is a `mult = "all"` argument that you can add to `foverlaps` but this is should be already the default. Can you please provide a reproducible example and show what would you expect to get? I'm starting to wonder though if you trying to match `wf` to `wd` ranges or the other way around. – David Arenburg May 11 '15 at 13:06
Sorry, do `wf <- na.omit(wf, cols = "wg")` (Use `cols` instead of `by`. I wrote the code without testing in my previous comment). – David Arenburg May 11 '15 at 13:13
1

If you have *real* `NA`s, `wf <- na.omit(wf, cols = "wg")` will work. For your second question, you can calculate means after words. It is a very simple aggregation task using `data.table`. – David Arenburg May 11 '15 at 13:43
Per your edit, you want remove one row, or to assign the average to both rows? – David Arenburg May 11 '15 at 13:47
1

Try `unique(res[, valu1 := mean(valu1), by = WGDate], by = "WGDate")` – David Arenburg May 11 '15 at 13:52

How to extract all columns when using data.table?

1 Answers1