This is my first time working with the Pandas library so if this a silly mistake I apologize.
For a Kaggle Competition, I'm creating a new column, Date_Diff, in a data frame based off of the difference between two columns. Both columns are initially strings, I convert them to datetime data types and then the resultant is a timedelta after I perform subtraction on them. Following that, I convert the type to a float by getting the days of the timedelta object.
All of the features of the dataset are thrown into a matrix in a function. I keep getting the following error when running that function:
ValueError: negative dimensions are not allowed
When assigning Date_Diff to the data frame, how can I keep the index in place so we can keep the number of rows (if this isn't the right terminology please let me know)?
I thought this post would answer my question, but unfortunately it didn't.
I've been looking up how to do this for two days now and still haven't found exactly what I'm looking for.
Converting to float and assigning the column:
for i in range(len(cur_date)):
date_diff.append(abs(cur_date[i] - iss_date[i]))
date_diff[i] = float(date_diff[i].days)
# The first three column assignments work fine, the Date_Diff does not
raw_data_set = raw_data_set.assign(Date_Diff = pd.Series(date_diff).values)
After tracing through the program, it looks like the matrix is getting negative dimensions because one of the parameters used to build the matrix is set to the total number of rows in the feature matrix - 1. The feature matrix isn't getting any rows so this is returing as 0 - 1 for a value of -1.
The function where the matrix is being created was in the starter code for the competition so I don't think it's wrong. I also added three columns to the data frame before Date_Diff and I had no issues so I really think it's with how I'm assigning Date_Diff.