0

I have the following code.

I am trying to check if a 'date-time' value in the column numberofeachconditiononthatdate['Date'] is in the column 'luckonthatdate['Date']'.

If it is, then I want that particular date-time value to be assigned to the variable 'value'.

If not, then I want the variable 'value' to equal 0.

In other words, I want to create a new column of values to the 'numberofeachconditiononthatdate' dataframe, indicating the number of 'luck' trials on a given date.

luckvalues = []

for idx in numberofeachconditiononthatdate.iterrows():
    if numberofeachconditiononthatdate['Date'][[idx]].isin(luckonthatdate['Date']):
       value = luckonthatdate['Date'][[idx]]
       luckvalues = luckvalues.append(value)
    else:
       value = 0
       luckvalues = luckvalues.append(value) 

print(luckvalues)

However, this gives me the error 'unhashable type: 'Series''.

I would be so grateful for a helping hand!

numberofeachconditiononthatdate['Date']

0   2020-04-06
1   2020-04-06
2   2020-04-06
3   2020-05-06
4   2020-05-06
5   2020-05-06
6   2020-06-06
7   2020-06-06
8   2020-06-06
9   2020-06-13

luckonthatdate['Date'].head(10)

0    2020-04-06
3    2020-05-06
6    2020-06-06
9    2020-06-13
16   2020-10-06
20   2020-11-06
23   2020-12-06
Caledonian26
  • 727
  • 1
  • 10
  • 27

3 Answers3

0

Instead of an explicit for loop, you can optimise it using merge. You can do something like:

numberofeachconditiononthatdate = (numberofeachconditiononthatdate
                                  .merge(luckonthatdate[['Date', 'luck']], how='left', on='Date'))

numberofeachconditiononthatdate['luck'] = numberofeachconditiononthatdate['luck'].fillna(0)

This will add a new column Dummy_Date in numberofeachconditiononthatdate dataframe. Later you can rename it as you want.

YOLO
  • 20,181
  • 5
  • 20
  • 40
  • Thanks so much for the suggestion - however, this output gives me all 'conditions' in the 'condition' column. In the for loop above, I wanted the column condition to include just the 'luck' condition, and for the column with the number of trials in to have a '0' at a date where there are none of that given trial-type (using the dates in the 'numberofeachconditiononthatdate' dataframe :) – Caledonian26 Nov 14 '22 at 10:15
  • In other words, I am adding a new column of values to the 'numberofeachconditiononthatdate' dataframe, indicating the number of 'luck' trials on a given date :) – Caledonian26 Nov 14 '22 at 10:18
  • I have now solved the issue - thank you for your support :) I have provided my solution above – Caledonian26 Nov 14 '22 at 11:17
0

If you want to add a column with the amount of repeated values per each index, you should use value_counts() and pass it to map() and lastly, use fillna(). For easiness I am going to rename:

df1 = numberofeachconditiononthatdate.copy()
df2 = luckonthatdate.copy()

And then create the luck column using:

df2['Luck'] = df2['Date'].map(df1['Date'].value_counts()).fillna(0)

Returning:

         Date  Luck
0  2020-04-06   3.0
1  2020-05-06   3.0
2  2020-06-06   3.0
3  2020-06-13   1.0
4  2020-10-06   0.0
5  2020-11-06   0.0
6  2020-12-06   0.0
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
0

I solved the issue by using a for loop with an 'if string is in row' command.

for i in range(0,len(numberofeachconditiononthatdate)): 
         if 'luck' in numberofeachconditiononthatdate['condition'].iloc[i]:
            newvalue = numberofeachconditiononthatdate['numberoftrials'].iloc[i]
            newvalues.append(newvalue)
         else:
            newvalue = 0 
            newvalues.append(newvalue)
    
print(newvalues)

[5, 0, 0, 1, 0, 0, 1, 0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 6, 0, 0]

Thanks so much for your help :)

Caledonian26
  • 727
  • 1
  • 10
  • 27
  • This is a very inefficient and highly discourage looping through a dataframe. – Celius Stingher Nov 14 '22 at 11:21
  • I am a first year PhD student - on a long journey - always open and ready to learn new things - please elaborate :) – Caledonian26 Nov 14 '22 at 11:28
  • I see, congratulations and best of luck on your studies! Please take some time to read through this question and answers. I'm sharing the link to an answer that really summarizes it, but the top answer goes into a lot more detail. https://stackoverflow.com/questions/54028199/are-for-loops-in-pandas-really-bad-when-should-i-care/62252249#62252249 As you can see, iterating is the worst performing approach. And if you think about it as well, when you are use loops with df['col'].iloc[i] you are loading the whole df multiple times to evaluate a single element, rather than once and vectorizing it – Celius Stingher Nov 14 '22 at 11:32