0

I've used the .loc all over some other code I did and it worked fine but I've been at this one for a couple of hours with no luck so far. My objective is to create and populate a column in a dataframe using two other dataframes everywhere a condition is met.

I read the post here and tried these snippets

df = df[~df.index.duplicated()] 
df.reindex()

but it didn't seem to work.

What does `ValueError: cannot reindex from a duplicate axis` mean?

import pandas as pd
df2 = df2.append(df, sort=True)
condition = df2['Period'] == df2['projection_period_1']
df2.loc[condition, 'Projection'] = 
df2['Projected_A'] / df2['Weekly_A']

I expect the output to populate df2['Projection'] with df2['Projected_A'] / df2['Weekly_A'] everywhere that the condition is met.

Instead I get "ValueError: cannot reindex from a duplicate axis"

1 Answers1

0

You're not using your index variable on the last line. I'm assuming you just want to use the values for those particular lines. Otherwise you should be grouping by period most likely.

import pandas as pd
df2 = df2.append(df, sort=True)
condition = df2['Period'] == df2['projection_period_1']
df2.loc[condition, 'Projection'] = df2.loc[condition, 'Projected_A'] / df2.loc[condition, 'Weekly_A']
krewsayder
  • 446
  • 4
  • 9
  • So this worked! Thanks, man. Can you point me to something to read so I can learn why it worked? I'm curious why the .loc is needed on the other side of the assignment. I have other examples where I don't do this and it works. I think I have a weak understanding of indexing and don't really know when/how/if to use them. But thank you! – Question_Mark Apr 17 '19 at 21:19
  • I can just explain it. It's because when you add the "condition" into the loc on the left side of =, you are shrinking the dataframe. The length of that dataframe is less than on the right. In order for you to leave out the condition on the left side, you would need to leave out condition on the right side to make all things equal in length. You could apply the condition after. ```df2['Projection'] = df2['Projected_A'] / df2['Weekly_A']``` then ```df2[condition]``` will return only the results that meet condition. – krewsayder Apr 17 '19 at 22:34
  • Got it, and this makes sense. Thank you! – Question_Mark Apr 18 '19 at 12:35