-1

So I have a date of birth vector in a data.frame. I want to evaluate, based on this date, which zodiac sign is the respondent.

I've seen this solution:

Checking if Date is Between two Dates in R

But, this approach would mean that I have to create 12 vectors times 2 for each zodiac sign (starting date and finishing date), to check if my date of birth falls between the two. Is there a more efficient way to do this?

So this is my data.frame:

data.frame(respondent = c(1,2,3,4,5), date_of_birth = seq(as.Date("2011-12-30"), as.Date("2012-04-30"), by="months") )

  respondent date_of_birth
1          1    2011-12-30
2          2    2012-01-30
3          3    2012-03-01
4          4    2012-03-30
5          5    2012-04-30 

and I want to get this:

  respondent date_of_birth    zodiac
1          1    2011-12-30 Capricorn
2          2    2012-01-30  Aquarius
3          3    2012-03-01    Pisces
4          4    2012-03-30     Aries
5          5    2012-04-30    Taurus
Astronaut
  • 45
  • 6
  • Non-equi joins are probably the easiest way to do this, you just need another data.frame with the start and end dates of the zodiacs and then use `fuzzyjoin`. Although you'll probably have to extent the Zodiac database to all possible years in the data. – hannes101 May 10 '19 at 12:59

2 Answers2

0

I think the *apply functions are just made for this work. You could try to use lapply on your fisrt data frame (more precisely: with its date_of_birth column) and with a data frame indexing the zodiac signs according to the date to produce a vector zodiac whose length equals the height of your data frame.

Elie Ker Arno
  • 346
  • 1
  • 11
0

That would work and with a fully populated zodiac database it should be pretty easy. What I mean with this is that you need a database, where for each year, you've got the different dates, because otherwise it's difficult to compare dates across New Year. Also please make sure that the conditions are correct, don't know anything about zodiac signs.

library(fuzzyjoin)
birth.days <- data.frame(respondent = c(1,2,3,4,5), date_of_birth = seq(as.Date("2011-12-30"), as.Date("2012-04-30"), by="months") )

zodiacs <- data.frame(Zodiac = c("Capricorn")
                      , Start.Date = as.Date("2011-12-22")
                      , End.Date = as.Date("2012-01-20"))

fuzzy_left_join(birth.days, zodiacs, 
               by = c("date_of_birth" = "Start.Date", "date_of_birth" = "End.Date"), 
               match_fun = list(`>=`, `<`))
    respondent date_of_birth    Zodiac Start.Date   End.Date
1          1    2011-12-30 Capricorn 2011-12-22 2012-01-20
2          2    2012-01-30      <NA>       <NA>       <NA>
3          3    2012-03-01      <NA>       <NA>       <NA>
4          4    2012-03-30      <NA>       <NA>       <NA>
5          5    2012-04-30      <NA>       <NA>       <NA>

Just as an example on how to populate a database with the dates:

Capricorn <- data.frame( Start.Date = seq.Date(from= as.Date("1900-12-22"), to = as.Date("2100-01-01"), by = "year")
                         , End.Date = seq.Date(from= as.Date("1901-01-20"), to = as.Date("2100-01-20"), by = "year")
                         , Zodiac = rep("Capricorn", 200 )

)
hannes101
  • 2,410
  • 1
  • 17
  • 40