2

I have code like this:

today<-as.Date(Sys.Date())
spec<-as.Date(today-c(1:1000))
df<-data.frame(spec)
stage.dates<-as.Date(c('2015-05-31','2015-06-07','2015-07-01','2015-08-23','2015-09-15','2015-10-15','2015-11-03'))
stage.vals<-c(1:8)
stagedf<-data.frame(stage.dates,stage.vals)
df['IsMonthInStage']<-ifelse(format(df$spec,'%m')==(format(stagedf$stage.dates,'%m')),stagedf$stage.vals,0)

This is producing the incorrect output, i.e.

df.spec, df.IsMonthInStage
2013-05-01, 0
2013-05-02, 1
2013-05-03, 0
....
2013-05-10, 1

It seems to be looping around, so stage.dates is 8 long, and it is repeating the 'TRUE' match every 8th. How do I fix this so that it would flag 1 for the whole month that it is in stage vals?

Or for bonus reputation - how do I set it up so that between different stage.dates, it will populate 1, 2, 3, etc of the most recent stage?

For example:

31st of May to 7th of June would be populated 1, 7th of June to 1st of July would be populated 2, etc, 3rd of November to 30th of May would be populated 8?

Thanks

Edit:

I appreciate the latter is functionally different to the former question. I am ultimately trying to arrive at both (for different reasons), so all answers appreciated

Henry
  • 1,646
  • 12
  • 28
  • I've partially solved this problem by using this: http://stackoverflow.com/a/23592345/5338754 However it is not a full solution yet :). I can use this to populate the months until a stage change, but I dont know yet how to prevent the repetition issue mentioned in the first half of the question. – Henry Sep 30 '15 at 13:27

1 Answers1

1

see if this works.

cut and split your data based on the stage.dates consider them as your buckets. you don't need btw stage.vals here.

Cut And Split

data<-split(df, cut(df$spec, stagedf$stage.dates, include.lowest=TRUE))

This should give you list of data.frame splitted as per stage.dates

Now mutate your data with index..this is what your stage.vals were going to be

Mutate

data<-lapply(seq_along(data), function(index) {mutate(data[[index]],
IsMonthInStage=index)})

Now join the data frame in the list using ldply

Join

data=ldply(data)

This will however give out or order dates which you can arrange by

Sort

arrange(data,spec)

Final Output

data[1:10,]
         spec IsMonthInStage
1  2015-05-31              1
2  2015-06-01              1
3  2015-06-02              1
4  2015-06-03              1
5  2015-06-04              1
6  2015-06-05              1
7  2015-06-06              1
8  2015-06-07              2
9  2015-06-08              2
10 2015-06-09              2
Dhawal Kapil
  • 2,584
  • 18
  • 31