So I'm working on a project where I need to identify multiple device launch indicators. In order to identify the launch indicators I need the line of code to search through a list of unique dates in a series and subtract the dates that have the same year. So if the SV_DATE was 2015/03/05, the code would look through the series 'Launch Date' to find a match in the year (2015/06/22 for example), and subtract the dates. The between() function checks to see if the result within the range of 0 and 30 days and returns a Boolean. And lastly astype(int) returns a 1 if True
When I run the code I come across two error messages. The first error has to do with the the truth value being ambiguous due to my comparing two columns.
def day_diff(end,start):
ed = pd.to_datetime(end)
sd = pd.to_datetime(start)
#if ed.dt.year == sd.year:
return (ed-sd).dt.days
data['AL030'] = day_diff(data['SV_DATE'],data_2.loc[(data_2['MFG'] == 'APPLE') & (pd.Series(pd.DatetimeIndex(data_2['Launch Date'])).dt.year == pd.Series(pd.DatetimeIndex(data['SV_DATE'])).dt.year), 'Launch Date']).between(0,30).astype(int)
In order for the code to run, I need to hard code the year, instead of having the code search through a column of dates. When I do this then the code works.
data['AL030'] = day_diff(data['SV_DATE'],data_2.loc[(data_2['MFG'] == 'APPLE') & (pd.Series(pd.DatetimeIndex(data_2['Launch Date'])).dt.year == 2017), 'Launch Date'].apply(lambda x:x.date().strftime('%Y-%m-%d'))).between(0,30).astype(int)
I'm getting this error before I even add the unique() function to it, which gives me a new error: 'ValueError: cannot add indices of unequal length'
data['AL030'] = day_diff(data['SV_DATE'],data_2.loc[(data_2['MFG'] == 'APPLE') & (pd.Series(pd.DatetimeIndex(data_2['Launch Date'])).dt.year == 2017), 'Launch Date'].apply(lambda x:x.date().strftime('%Y-%m-%d')).unique()).between(0,30).astype(int)
If I didn't want to compare the years between the columns, this piece of code would have sufficed:
data['AL030'] = day_diff(data['SV_DATE'],data_2.loc[(data_2['MFG'] == 'APPLE'), 'Launch Date']).between(0,60).astype(int)
At the end of the day, I'm trying to optimize this piece of code in R to return the same value without utilizing a function like this launch.ind one, while dually adding the year condition to try to cut down on run time:
day_diff = function(end,start){
x = difftime(end,start,units=c("days"))
return(x)
}
launch.ind = function(ship.date,launch.date,low,high){
y = rep(0,length(data$SV_DATE))
for (i in seq(length(data$SV_DATE))){
y[i] = sum(ifelse((day_diff(ship.date[i],launch.date)>=low)&(day_diff(ship.date[i],launch.date)<=high),1,0))
y[i] = ifelse(y[i] > 0, 1, 0)
}
return(y)
}
###############################
# Add launch indicators
data$AL030 = launch.ind(data$SV_DATE,unique(data_2$"Launch Date"[toupper(data_2$MFG)=="APPLE"]),0,30)
I appreciate anyone attempting to help and I'm open to suggestions to help clarify anything that was unclear