1

I'm trying to add a feature column to my dataframe and match it to my existing dataframe rows by month and year (which I've stored in integer columns).

I've tried using .iloc[] to specify the row to add the new feature variable df['Price Level'] that is taken from i_df['CPIAUCNS'], but after reading a lot of Stack Overflow, it seems like np.where is a more appropriate function for a conditional statement.

bool_filter = ((df['Release Date Year'] == i_df['Release Date Year']) 
               & (df['Release Date Month'] == i_df['Release Date Month']))

df['Price Level'] = np.where(bool_filter, i_df['CPIAUCNS'])

I was hoping this would generate a new feature column in df with the value from i_df where Year and Month were equal, instead I receive:

ValueError: Can only compare identically-labeled Series objects

This error is thrown in the bool_filter so the np.where does not get to execute.

Would someone be able to explain why this conditional statement generates this error and how I might be able to rephrase it?

EDIT:

Trying to use .values() in the boolean filter results in the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-9b470b5aee2c> in <module>()
      5 # df[df['Release Date'].isna() == True]
      6 
----> 7 bool_filter = ((df['Release Date Year'].values() == i_df['Release Date Year'].values()) 
      8                & (df['Release Date Month'].values() == i_df['Release Date Month'].values()))
      9 

TypeError: 'numpy.ndarray' object is not callable

2 Answers2

0

SOLUTION #1

You should use df.merge()

df = df.merge(i_df, how='left', left_on=['Release Date Year', 'Release Date Month'],
right_on=['Release Date Year', 'Release Date Month'])

This will join you i_df df to your df dataframe. It will do a left join in this example, but feel free to change the join type.

You will end up with a new df with the column you desire.


SOLUTION #2

Another solution, would be to use your boolean filter to filter your i_df dataframe

bool_filter = ((df['Release Date Year'] == i_df['Release Date Year']) 
               & (df['Release Date Month'] == i_df['Release Date Month']))

df['Price Level'] = i_df[bool_filter == True].CPIAUCNS

Now, this will consider that indexes of both dataframe are aligned. Becarefully if you cannot guarantee that both indexes are aligned.

Teddy
  • 633
  • 1
  • 8
  • 22
  • For solution 2, do you know why when I write the bool_filter as below: `bool_filter = ((df['Release Date Year'] == i_df['Release Date Year']) & (df['Release Date Month'] == i_df['Release Date Month']))` I get `ValueError: Can only compare identically-labeled Series objects` – Salt_from_the_Sea Apr 17 '19 at 00:34
  • You may want to compare your data type from your df and i_df using df.info() and i_df.info(). You may have data type miss match between your 2 data frame or some null values. – Teddy Apr 17 '19 at 00:51
  • You may want to checkout this SO post for your question regarding the error. It may have to do with your index in both of your df. https://stackoverflow.com/questions/18548370/pandas-can-only-compare-identically-labeled-dataframe-objects-error – Teddy Apr 17 '19 at 15:30
0

Based on Teddy's answer I finally succeeded with the following merge statement:

df = df.merge(i_df[['Release Date Year', 'Release Date Month','CPIAUCNS']], 
              how='left', on=['Release Date Year', 'Release Date Month'])

I ended up with CPIAUCNS in my dataframe, which was my goal. Thanks Teddy!

However, I still don't understand the problem with my initial bool_filter.