0

This is my first time working with the Pandas library so if this a silly mistake I apologize.

For a Kaggle Competition, I'm creating a new column, Date_Diff, in a data frame based off of the difference between two columns. Both columns are initially strings, I convert them to datetime data types and then the resultant is a timedelta after I perform subtraction on them. Following that, I convert the type to a float by getting the days of the timedelta object.

All of the features of the dataset are thrown into a matrix in a function. I keep getting the following error when running that function:

ValueError: negative dimensions are not allowed

When assigning Date_Diff to the data frame, how can I keep the index in place so we can keep the number of rows (if this isn't the right terminology please let me know)?

I thought this post would answer my question, but unfortunately it didn't.

I've been looking up how to do this for two days now and still haven't found exactly what I'm looking for.

Converting to float and assigning the column:

for i in range(len(cur_date)):
    date_diff.append(abs(cur_date[i] - iss_date[i]))
    date_diff[i] = float(date_diff[i].days)

# The first three column assignments work fine, the Date_Diff does not
raw_data_set = raw_data_set.assign(Date_Diff = pd.Series(date_diff).values)

After tracing through the program, it looks like the matrix is getting negative dimensions because one of the parameters used to build the matrix is set to the total number of rows in the feature matrix - 1. The feature matrix isn't getting any rows so this is returing as 0 - 1 for a value of -1.

The function where the matrix is being created was in the starter code for the competition so I don't think it's wrong. I also added three columns to the data frame before Date_Diff and I had no issues so I really think it's with how I'm assigning Date_Diff.

ml50
  • 39
  • 11
  • It looks too long for me to read. Can you provide an [mcve] ? The title looks nice, but all these details seems irrelevant to the question. Or are they ? – IMCoins Jul 31 '19 at 13:02
  • I'll go back and edit what code segments I use here and try to provide a small example. I mainly showed the second bit of code to show how the matrix was built out just in case anyone asked for that – ml50 Jul 31 '19 at 13:05
  • We still cannot launch your code after your edit ;) – IMCoins Jul 31 '19 at 13:15
  • in my experience if you give your input (dataframe) and output (also dataframe) in the question, you get better answers. There might be actually simpler way to accomplish what you are doing. Try that approach. You can copy/paste the dataframe inside the HTML Snippet (icon next to Picture) directly. – moys Jul 31 '19 at 13:32

0 Answers0