1

Hello i am having an interesting issue with R

When i do :

touchtimepairs = structure(list(v..length.v.. = structure(c(1543323677.254, 1543323678.137, 1543323679.181, 1543323679.918, 1543323680.729, 1543323681.803, 1543323682.523, 1543323682.977,1543323683.519, 1543323684.454), class = c("POSIXct", "POSIXt"), tzone = "CEST"),v.2.length.v.. = structure(c(1543323678.137, 1543323679.181, 1543323679.918, 1543323680.729, 1543323681.803, 1543323682.523, 1543323682.977, 1543323683.519, 1543323684.454, 1543323690.793), class = c("POSIXct", "POSIXt"), tzone = "CEST")), .Names = c("v..length.v..", "v.2.length.v.."), row.names = c(NA, 10L), class = "data.frame")

data = data.frame(a = seq(1,10), b = seq(21,30), posixtime = touchtimepairs[,1])



for(x in seq(nrow(touchtimepairs))){
    a = data$[data$posixtime < touchtimepairs[x,2],]
}

it works without a problem i get results back but when i try to use apply

a = apply(touchtimepairs, 1, 
          function(x) data[data$posixtime < x[2],])

it does not work anymore, I get an empty data frame. The same happens with the subset() command. Interestingly when i do > instead of < it works !

a = apply(touchtimepairs, 1, 
          function(x) data[data$posixtime > x[2],])

Then there is another issue:

apply in the case of the > comparison gives another result than the for loop

1951 lines with apply and 1897 with the for loop

can anyone reproduce this behavior?

The posix time has also miliseconds if that is of any interest

Many thanks

hemi
  • 13
  • 3
  • 2
    Could you possibly post a small subset of your data together with intended output, ideally with intentions? That would really help! – 12b345b6b78 Nov 28 '18 at 01:02
  • Is `touchtimepairs` a `data.frame`? Are all the columns the same `class`? If not, then `apply(touchtimepairs,...)` always always always messes with your data in some form. The only time that will work as desired is if all of the columns in your frame are of the same class. Suggest you change your anon-func to be `function(x) {browser();...;}` and take a look at `str(x)`, perhaps it would be exactly what you think it is. – r2evans Nov 28 '18 at 01:09
  • both columns of touchtimepairs are "POSIXct" "POSIXt" classes, thanks for the browser tip and indeed , after calling apply x becomes a character class – hemi Nov 28 '18 at 01:13
  • I still cannot reproduce anything. Please make this question *reproducible*. This includes sample code (including listing non-base R packages), sample data (e.g., `dput(head(x))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. In this specific case, consider providing a small set of each frame, perhaps 3-5 rows each, where you know you'll have a fixed set of matches. Please don't paste into comments. Please use the output from `dput`. It makes a difference. – r2evans Nov 28 '18 at 01:14
  • I dont know exactly how to paste posixtime with ms into the code panel need a sec :-), no non base r code needed – hemi Nov 28 '18 at 01:20
  • *"dont know"* ... please read the link, it's there: `dput(head(touchtimepairs))` (and again for your other frame). BTW: *all* `POSIXt` objects have milliseconds, regardless of what it shows on the console. Realize that there is a difference between what is stored and what is shown. `pi` is a good example ... do you really want R to try to show you all of the digits? – r2evans Nov 28 '18 at 01:23
  • thanks for pointing me to the reproducible example some edits are needed i see – hemi Nov 28 '18 at 01:34
  • For me: `apply(touchtimepairs[1:2,], 1, function(x) {browser();x})` and then `class(x)` gives `character`. I'll expand my previous warning about using `apply` with frames of mixed-class: don't use `apply` with frames that have anything other than `numeric`/`integer`. (This might be related to the warning I get with your data: `unknown timezone 'CEST'`. Edit: nope, that's not it.) – r2evans Nov 28 '18 at 01:35
  • Yes i can reproduce it now, so i should reformat the data in lists and use lapply instead – hemi Nov 28 '18 at 01:55
  • What ultimately are you trying to do? You have two columns in `touchtimepairs`, it seems like you want to use them both for something ... ranges of times, perhaps? – r2evans Nov 28 '18 at 01:57
  • Thanks, as r2evans said apply and mixed dataframes do not mix, and the idea is to use `l = split(touchtimepairs, seq(nrow(touchtimepairs)))` and then `lapply(l , function(x){data[data$posixtime < x[[2]],]} )` – hemi Nov 28 '18 at 02:00
  • I want to filter my data which comes from a big xml file and is >20 mb by time intervals and save everything separated. – hemi Nov 28 '18 at 02:05

1 Answers1

0

If you look at your data inside the apply anonymous function, you'll see the symptom that is causing your trouble.

apply(touchtimepairs, 1, class)
#           1           2           3           4           5           6           7           8           9          10 
# "character" "character" "character" "character" "character" "character" "character" "character" "character" "character" 

(It should be returning a 2-row matrix with POSIXct and POSIXt.) I should also note that I kept getting warnings about unknown timezone 'CEST'. I fixed it temporarily with attr(touchtimepairs[[1]], "tzone") <- "UTC", though that's just a kludge to stop the warnings on my console. It doesn't fix the problem and might just be my system. :-)

If you are trying to use both columns of touchtimepairs, you have two options:

  1. If you really only need one of touchtimepairs at a time, then lapply will work:

    lapply(touchtimepairs[[1]],
           function(x) subset(data, posixtime < x))
    
  2. If you need to use both columns at the same time, use an index on the rows:

    lapply(seq_len(nrow(touchtimepairs)),
           function(i) subset(data, posixtime < touchtimepairs[i,2]))
    

    (where you'd also reference touchtimepairs[i,1] somehow).

  3. Especially if you are trying to use both columns simultaneously, you can use Map:

    Map(function(a, b) subset(data, a < posixtime & posixtime <= b),
        touchtimepairs[[1]], touchtimepairs[[2]])
    

    (This does not return anything in your sample data, so either the data is not the best representative sample, or you are not intending to use it in this fashion. Most likely the latter, I'm just guessing :-)

    The biggest difference between Map and the *apply family is that it accepts one or more vectors/lists and zips them together. As an example of this "zipper" effect:

    Map(func, 1:3, 11:13)
    

    is effectively calling:

    func(1, 11)
    func(2, 12)
    func(3, 13)
    
r2evans
  • 141,215
  • 6
  • 77
  • 149