0

I've written some code with help to a previously answered question. Initially I had this code:

getT <- function(df, ID, Number){
  df %>%
    group_by(ID, Number) %>% 
    mutate( Distance = finish - begin) %>% 
    select(-begin,-finish,-symbols) %>%
    nest() %>% 
    mutate( data = map( data, ~ filter(.x, Distance == max(Distance)))) %>% 
    unnest()
}

getallT <- as.data.frame(getT(df))

getTID <- function(df, ID) {
  subset(x = getallT, subset = (ID))
}

Which gave this output:

ID     Number     Time     Distance
33         1      2.00         870
33         2      1.98         859
33         3      0.82         305
33         4      2.02         651
33         5      2.53         502

I wanted to filter it by Time so I used this code(thanks to a post below):

getHLN <- function(df, ID) {
  getallT %>% filter (ID ==id & !between(Time, 1.50, 2.10))
}

Which now gives this output:

  ID Number Time Distance
1 33      3 0.82      305
2 33      4 2.02      651
3 33      5 2.53      502

But now I've come across an issue so now I'm left wondering how to either: A. Filter out Number 4 & 5 so that I can create a separate function with a different Time filter for it. To later create another different function to merge the two previous functions into one. OR B. Create a different Time filter specifically for Number 4 & 5 within the same function.

I tried doing A. by using filter (getallT, Number >= 3) %>% but doesn't work. I would rather go with B though if possible. So something like... For ID numbers 1-3: filter(!between(Time,1,2)) For ID numbers 4-5: filter(!between(Time 1.5,2.3)) within the same function. I've been trying out a few things for the past day but keep getting error messages such as Error in filter_impl(.data, quo) : Evaluation error: operations are possible only for numeric, logical or complex types.

I've been trying out what's on here but must not be doing something write so need some insight! http://genomicsclass.github.io/book/pages/dplyr_tutorial.html

Here is an example dataset

df <- data.frame(ID=rep(33,5),
                 Number=1:5,
                 Time=c(2.00,1.98,0.82,2.02,2.53),
                 Distance=c(870,859,305,651,502))

Any help would be much appreciated.

  • 1
    Could you provide a small reproducible example – akrun Jul 08 '17 at 14:16
  • 1
    Please get into the habit as you ask SO questions of including a reproducible dataset. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – cmaher Jul 08 '17 at 14:17
  • I do have the bad habit of forgetting to add that! Woops, thanks for the reminder! – Young Autobot Jul 08 '17 at 14:45

1 Answers1

1

This function is somewhat confusing:

getHLN <- function(df, ID) {
  data_df1 <- getT(race_df)
  subset(x = getallT, subset = (ID)) %>%
    filter (!between(Time, 1.50, 2.10))
}

Mainly because it takes a df argument, which it doesn't within, and uses two data.frames from the outer environment in race_df and getallT. Your calls to subset are also a tad mystifying. As it stands, the function will return whatever the expression beginning with subset will return, and will throw away data_df1.

getHLN <- function(df, ID) {
  # this gets locally assigned within the function and then 
  # becomes unreachable once the function ends
  data_df1 <- getT(df)
  # this expression would produce the last value of the function
  # and so the function would return its value
  subset(x = getallT, subset = (ID)) %>%
    filter (!between(Time, 1.50, 2.10))
}

We can do the sort of filtering you described by creating ID %in% ... & !between() logic for both sets of criteria you described, wrapping each in parentheses so that they're evaluated as "and" logic, and then adding both to the filter function and joining them with the | operator ("or"), which filter will then evaluate as "filter df where (criteria a AND B) OR (criteria c AND d)".

getHLN <- function(df) {
  df %>% filter(
    (Number %in% 1:3 & !between(Time, 1, 2)) |
      (Number %in% 4:5 & !between(Time, 1.50, 2.10))
    )
}
cmaher
  • 5,100
  • 1
  • 22
  • 34
  • While fiddling about further I realised my data_df1 is pointless but now you showed me an even sleeker code so for that I thank you. I am self-taught in R and new to programming in general too. I'm sure you can tell. Is there a way I can add another `filter` related to `Time` for just `Number` 4 and 5 though? I keep getting the same error message returned. – Young Autobot Jul 08 '17 at 15:35
  • 1
    Ah yes, I re-read your question and now see what you're hoping to do -- please see my revised answer. – cmaher Jul 08 '17 at 15:56
  • Thanks for your help! My code looks better and is functional now! – Young Autobot Jul 08 '17 at 21:23