1

I have some data regarding separation in x and y as a function of time. There can be a separation only in x, only in y, or both (diagonal, with x==y):

data
#  Source: local data frame [307 x 4]

#             t0         t1             x             y
# 1   1449241093 1449241345            NA  4.085057e-02
# 2   1449241345 1449241537            NA  4.085057e-02
# ...
# 7   1449242375 1449242627  4.085057e-02            NA
# 8   1449242627 1449242818  4.085057e-02            NA
# ...
# 78  1449245524 1449246079  0.000000e+00  0.000000e+00
# 79  1449246079 1449246101 -2.042528e-01 -2.042528e-01
# ...

I want to bring this into this format:

# Source: local data frame [307 x 4]

#            t0         t1 direction    separation
# 1  1449241093 1449241345         Y  4.085057e-02
# 2  1449241345 1449241537         Y  4.085057e-02
# ...
# 8  1449242627 1449242818         X  4.085057e-02
# 9  1449242818 1449242949         X  4.085057e-02
# ...
# 78  1449245524 1449246079        D  0.000000e+00
# 79  1449246079 1449246101        D  2.888571e-01
# ...

Currently I'm doing this using code like this:

data %>% mutate(direction=ifelse(is.na(x),"Y", ifelse(is.na(y),"X","D")),
                separation=ifelse(is.na(x),y, ifelse(is.na(y),x, sqrt(x**2 + y**2))) %>%
         select(data,-x,-y) 

My question: Is there a nicer way to do this using tidyr::gather()?

This would work nicely if not for the diagonal case, where I get multiple rows (obviously because gather is not being told how to handle these cases):

gather(data,direction,separation,x,y, na.rm=T) %>% arrange(t0)
# Source: local data frame [396 x 4]

#             t0         t1 direction    separation
# 1   1449241093 1449241345         y  4.085057e-02
# 2   1449241345 1449241537         y  4.085057e-02
# ...
# 7   1449242375 1449242627         x  4.085057e-02
# 8   1449242627 1449242818         x  4.085057e-02
# ...
# 77  1449245524 1449246079         x  0.000000e+00
# 78  1449245524 1449246079         y  0.000000e+00
# 79  1449246079 1449246101         x -2.042528e-01
# 80  1449246079 1449246101         y -2.042528e-01
# ...

Basically, what I need is a more advanced version of How to collapse many records into one while removing NA values

Community
  • 1
  • 1
Graipher
  • 6,891
  • 27
  • 47
  • You could use `distinct` after `gather` to keep just the rows with non-duplicated values of `t0` and `t1`, i.e., `%>% distinct(t0, t1)`, assuming `t0` and `t1` are unique identifiers of rows. – aosmith May 13 '16 at 18:11

1 Answers1

1

I'm not sure if this is preferable to explicit ifelse, but here you go:

library(data.table)

setDT(df)[!is.na(x) | !is.na(y), .(t0, t1,
              direction  = c('X', 'Y', 'D')[((!is.na(.SD)) %*% c(1, 2))],
              separation = sqrt(rowSums(.SD^2, na.rm = T)))
          , .SDcols = x:y]

Translation to dplyr is left to the reader.

eddi
  • 49,088
  • 6
  • 104
  • 155
  • 1
    I get this error trying to use it: ```Error in `[.tbl_df`(setDT(steps), , .(t0, t1, direction = c("X", "Y", : unused argument (.SDcols = x:y)``` – Graipher May 13 '16 at 16:29
  • sounds like it didn't become a `data.table` for some reason - I can suggest upgrading to latest version of `data.table`, and if that doesn't help trying `as.data.table` instead of `setDT` – eddi May 13 '16 at 16:34
  • Hm, I just tried it like this: ```> df <- as.data.table(steps)``` ```> setDT(df)[, .(t0, t1, direction = c('X', 'Y', 'D')[((!is.na(.SD)) %*% c(1, 2))], separation = sqrt(rowSums(.SD^2, na.rm = T))), .SDcols = x:y] Error in `[.tbl_df`(setDT(df), , .(t0, t1, direction = c("X", "Y", "D")[((!is.na(.SD)) %*% : unused argument (.SDcols = x:y)``` – Graipher May 13 '16 at 16:39
  • what's your `data.table` version? – eddi May 13 '16 at 16:41
  • I had version 1.9.4. I just did ```update.packages()``` and got 1.9.6. It still gives the same error, though. – Graipher May 13 '16 at 16:55
  • Unfortunately I don't know how I can help you without a reproducible example. I don't have any issues with my mock data. – eddi May 13 '16 at 16:58
  • [Here](https://gist.github.com/graipher/6a27924380b2dd40877f794c06f8d7d7) is my current dataframe. Read it with ```read.table("file.txt")``` – Graipher May 13 '16 at 17:04
  • ok - I had to make a small change to the code to filter out all NA rows (see edit), but I don't have any issues with `.SDcols` - it works just fine. I'm running out of suggestions, and they're all of type "have you tried turning it off and then back on". Make sure your R is at least 3.1, then restart it and try again in a clean session. – eddi May 13 '16 at 17:12
  • Hm, did try the restarting after the update. R version is 3.2.3. Tried an R session with nothing besides this and still getting this (different) error: ```Error in eval(expr, envir, enclos) : Could not find object 't0'``` – Graipher May 13 '16 at 17:15
  • Not sure what to tell you. You can try distilling it to a minimal example that fails and posting as a separate question. – eddi May 13 '16 at 17:32