Subset a dataframe using if-conditions inside a for loop

Question

I'm trying to use this basic structure to address a data reshaping problem;

for(i in 1:5) {                           # Head of for-loop
  if(i < 4) {                             # First if-condition 
    if(i %in% seq(2, 10, 2)) {            # Second if-condition 
      print(i)                            # Some output
    }
  }
}

Disclaimer, although I'm discussing "dates" in this code they are a Julian date system, so they're not in POSIXct format and behave as integers.

I want to use a list of values ("dates") to find cases in a list "bydates" that meet 2 conditions, and write them to a new df. "bydates" has 2275 observations of 4 variables; NatalName, JStart, JEnd, FAM (format chr, num, num, chr).

for each value in "dates" (i) I want to assess if JStart < i, and if JEnd > i, and if both conditions are met to write to the lists df in the format i, NatalNAme, FAM.

This is one of my attempts, that I keep coming back to (I also tried functions, and ifelse and if_else, without success).

lists <- c() # create a blank variable to store the result

for(i in dates) 
        {if(bydates$Jstart <= i) {
                if(JEnd > i) {
                        lists <- as.df(i, bydates$FAM, bydates$NatalName)
        }
}
}

This returns "Error in if (bydates$Jstart <= i) { : the condition has length > 1"

I think this means more than one value in my "bydates" df meets the condition, which is correct, but then does that mean I should be looping on "bydates" instead? I've spent more than a week researching this and I remain stuck. I'm also confused why I don't get the commonly reported "the condition has length >1 and only the first element will be used" error.

Any help very much appreciated.

EDIT: as requested by @Stefan, a snippet of the data using dput

> dput(dates[1:4])
c(744, 864, 984, 1224)
> dput(head(bydates))
structure(list(NatalName = c("AAN12", "AAN18", "AAN20", "ABI96", 
"ABR12", "ABR17"), Jstart = c(1113, 1178, 1203, 914, 1105, 1175
), JEnd = c(1158, 1180, -23053, 915, -23053, -23053), FAM = c("AA", 
"AA", "AA", "AA", "AA", "AA")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

you might want to look at how to construct `if()` statements/have multiple conditions in the same `if()`. maybe this [answer](https://stackoverflow.com/questions/31261946/multiple-if-statements-in-r) will help. — D.J, Jan 25 '23 at 06:48
You are most likely having the issue that `if` does not work on vectors, try replacing them with `ifelse()`. See [here](https://www.statology.org/r-condition-has-length-1-only-first-element-will-be-used/) for more info. — Godrim, Jan 25 '23 at 07:06
Without seeing the actual code and data it's difficult to be sure but I suspect you don't need a loop nor `if`. You probabably only need to subset your data with a logical vector. — Roland, Jan 25 '23 at 07:09

score 0 · Answer 1 · answered Jan 25 '23 at 07:12

The issue is that if is not vectorized. It is meant to be used to control the flow of your script, i.e. if one condition is met do A. Put differently, if requires that the condition returns a logical vector of length 1, whereas your condition bydates$Jstart <= i returns a vector of length > 1, i.e. a length equal to the number of rows in your dataset. Hence, you get an error.

While there is the vectorized ifelse, overall an if is not needed. What you are trying to achieve is to subset a dataset and this could easier be achieved by using e.g.

bydates[bydates$Jstart <= i & bydates$JEnd > i,]

Also, as you want a list of datasets you may consider to use lapply instead of a for loop:

A minimal reproducible example of using lapply (and subset for the subsetting) based on some fake example data may look like so:

bydates <- data.frame(
  JStart = seq(as.Date("2022-01-03"), as.Date("2022-01-12"), by = "day"),
  JEnd = seq(as.Date("2022-01-01"), as.Date("2022-01-10"), by = "day"),
  FAM = letters[1:10],
  NatalName = LETTERS[1:10]
)

dates <- c(as.Date("2022-01-05"), as.Date("2022-01-10"))

lapply(dates, function(i) {
  bydates$i <- i
  subset(bydates, JStart >= i & JEnd < i, c(i, FAM, NatalName))
})
#> [[1]]
#>            i FAM NatalName
#> 3 2022-01-05   c         C
#> 4 2022-01-05   d         D
#> 
#> [[2]]
#>            i FAM NatalName
#> 8 2022-01-10   h         H
#> 9 2022-01-10   i         I

thanks @stefan, This makes a lot of sense, and I do see what you mean. However, I'm not sure this quite works; using your code on my data, I get ; Error in `x[r, vars, drop = drop]`: ! Can't subset columns past the end. ℹ Location 744 doesn't exist. ℹ There are only 4 columns. Run `rlang::last_error()` to see where the error occurred. Warning message: In dates$i <- i : Coercing LHS to a list As I said in my original post, here my dates are actually in Julian format and 744 is the first value in the list. Even if I add dates<- as.list(dates) I still get the error less the last warning — V Fishlock, Jan 26 '23 at 06:29
Hm. Could you provide a snippet of `bydates` and `dates` via `dput(), e.g. type `dput(head(bydates))` into the console and copy the output as an edit to your post and similarly using e.g. `dput(dates[1:5])`. For more on `dput()` see [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — stefan, Jan 26 '23 at 06:39

score 0 · Answer 2 · answered Mar 20 '23 at 12:28

I worked out a solution using purrr, with the help of a friend;

    library(purrr)

    setdates<-function(y){
    
    bydates %>%
            filter(Jstart<= y & JEnd >y) %>%
            mutate(Date = y) %>%
            select(NatalName, FAM, Date) %>%
            arrange(FAM)}

    famdat<-map_df(dates, setdates)

My problem was that I was trying to integrate dates before setting up a function handling the data manipulation. In case this helps anyone else....

Subset a dataframe using if-conditions inside a for loop

2 Answers2