0

I have defined the following function in r:

#A function that compares color and dates to determine if there is a match
getTagColor <- function(color, date){
    for (i in (1:nrow(TwistTieFix))){
        if ((color == TwistTieFix$color_match[i]) & 
            (date > TwistTieFix$color_match[i]) &       
            (date <= TwistTieFix$julian_cut_off_date[i])) {
          Data$color_code <- TwistTieFix$color_code[i]
          print(Data$color_code)
        }
    }
}

I then used apply() in an attempt to apply the function to each row.

#Apply the above function to the data set
testData <- apply(Data, 1, getTagColor(Data$tag_color,Data$julian_date))`

The goal of the code is to use two variables in Data and find another value to put into a new column in Data (color_code) that will be based on the information in TwistTieFix. When I run the code, I get a list of warnings saying

In if ((color == TwistTieFix$color_match[i]) & (date >  ... :
  the condition has length > 1 and only the first element will be used 

I cannot determine why the function does not use the date and color from each row and use it in the function (at least that is what I think is going wrong here). Thanks!

Here are examples of the data frames being used:

TwistTieFix

color_name   date          color_code     cut_off_date      color_match       julian_start      julian_cut_off_date
yellow       2013-08-12    y1             2001-07-02        yellow            75                389
blue         2000-09-28    b1             2001-08-12        blue              112               430

Data

coll_date      julian_date    tag_color
2013-08-13     76             yellow
2013-08-14     76             yellow
2000-09-29     112            blue

Data has a lot more columns of different variables, but I am not allowed to include all of the columns. However, I have included the columns in Data that I am referencing in function. The data sets are loaded into r using read.csv and are from Excel csv files.

mrob1052
  • 27
  • 1
  • 6
  • 1
    Can you post a reproducible example? Note that apply will pass the row (or the column) as first argument to the function, and extra argument should be passed as extra parameters to apply. – nico Jun 29 '14 at 17:30
  • Sample input and desired output would be very helpful here. See [how to make a great R reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for tips on how to do that. You should really never need to `apply` over the rows of a data.frame`. Most operations are vectorized or can be vectorized so that you can just pass in the columns. Certainly there is a better way to write the `getTagColor` function but it's hard to offer specific suggestions without seeing `TwistTieFix` or your input data. – MrFlick Jun 29 '14 at 18:09
  • It's a really bad idea to call an object (`TwistTieFix`) inside a function without passing it as an argument. Sooner or later something will happen in the parent environment that will make you sad. – Carl Witthoft Jun 29 '14 at 20:50

1 Answers1

1

To me, it seems like you want to join Data and TwistTieFix where tag_color=color_match and julian_start <= julian_date <= julian_cut_off_date. Here are your sample data.sets in dput form

TwistTieFix <- structure(list(color_name = structure(c(2L, 1L), .Label = c("blue", 
"yellow"), class = "factor"), date = structure(c(2L, 1L), .Label = c("2000-09-28", 
"2013-08-12"), class = "factor"), color_code = structure(c(2L, 
1L), .Label = c("b1", "y1"), class = "factor"), cut_off_date = structure(1:2, .Label = c("2001-07-02", 
"2001-08-12"), class = "factor"), color_match = structure(c(2L, 
1L), .Label = c("blue", "yellow"), class = "factor"), julian_start = c(75L, 
112L), julian_cut_off_date = c(389L, 430L)), .Names = c("color_name", 
"date", "color_code", "cut_off_date", "color_match", "julian_start", 
"julian_cut_off_date"), class = "data.frame", row.names = c(NA, 
-2L))

Data <- structure(list(coll_date = structure(c(2L, 3L, 1L), .Label = c("2000-09-29", 
"2013-08-13", "2013-08-14"), class = "factor"), julian_date = c(76L, 
76L, 112L), tag_color = structure(c(2L, 2L, 1L), .Label = c("blue", 
"yellow"), class = "factor")), .Names = c("coll_date", "julian_date", 
"tag_color"), class = "data.frame", row.names = c(NA, -3L))

An easy way to perform this merge would be using the data.table library. You can do

#convert to data.table and set keys
ttf<-setDT(TwistTieFix)
setkey(ttf, color_match, julian_start)

dt<-setDT(Data)
setkey(dt, tag_color, julian_date)

#merge and extract columns
ttf[dt, roll=T][julian_start<julian_cut_off_date,list(coll_date, 
    julian_date=julian_start, tag_color=color_match, color_code)]

to get

    coll_date julian_date tag_color color_code
1: 2000-09-29         112      blue         b1
2: 2013-08-13          76    yellow         y1
3: 2013-08-14          76    yellow         y1
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you! If I want to include all of the columns in Data (there are more than listed since I am not allowed to post the complete data set), how would the above code be modified? Would I have to include all of the variable names in setkey() for dt? – mrob1052 Jun 29 '14 at 23:01
  • You would include them in the list() on the very last line. The list says which variables to return as columns. The keys would remain unchanged unless you need to change which values are matched during the merge. – MrFlick Jun 29 '14 at 23:24