0

So i'm trying to join two dataframes together to plot against another dataframe.

I tried:

genders2 = np.array(male_grades[['grade_difs']].join(female_grades[['grade_difs']], how='outer'))

and get the error:

AttributeError: 'Series' object has no attribute 'join'

I was able to use this type of code earlier in the program i'm writing:

genders = np.array(male[['MaleAge']].join(female[['FemaleAge']], how='outer'))

If i'm being too vague let me know and ill try to add more code to make it more sense or something.

Code before:

data['grade_difs'] = (data['OGrade'] - data['IGrade'])

female_grades = data[data['Gender'] == 'F']['grade_difs']
male_grades = data[data['Gender'] == 'M']['grade_difs']
ksalerno
  • 177
  • 3
  • 10
  • The error indicates that `male` is a `pd.Series` and when you slice it with `male[['MaleAge']]` you get another `pd.Series`... and `pd.Series` does not have a `join` method. You want to check your other code and see how `male` became a `pd.Series` if yo didn't intend it to be. – piRSquared Apr 19 '17 at 15:21
  • the male_grades? I just edited it to show the code right before it. – ksalerno Apr 19 '17 at 15:22
  • Maybe pd.concat will help you in you case. http://pandas.pydata.org/pandas-docs/stable/merging.html – Wenlong Liu Apr 19 '17 at 15:25

1 Answers1

1

Two points

  1. You are definitely getting a pd.Series in male_grades. This will fix it

    female_grades = data[data['Gender'] == 'F'][['grade_difs']]
    male_grades = data[data['Gender'] == 'M'][['grade_difs']]
    
  2. But I'd rather do it like this

    female_grades = data.loc[data['Gender'] == 'F', ['grade_difs']]
    male_grades = data.loc[data['Gender'] == 'M', ['grade_difs']]
    

After that, you need to make sure you specify suffixes in your join in case you have column names in common. It's often sufficient to specify suffixes on one side of the join.

male[['MaleAge']].join(female[['FemaleAge']], how='outer', rsuffix='_')
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thank you, however; i'm not getting error: ValueError: columns overlap but no suffix specified: Index(['grade_difs'], dtype='object') Does this mean the index of the data frames dont line up? – ksalerno Apr 19 '17 at 15:30
  • So you've gotten past the first error and on to another error. That's good. That's progress :-) This one is because you have column names in common. Its difficult to do this when you don't provide a complete example. You should always provide what we call a minimal and complete verifiable example or [***MCVE***](http://stackoverflow.com/help/mcve). I'll update my answer. – piRSquared Apr 19 '17 at 15:34
  • @ksalerno, please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [edit](http://stackoverflow.com/posts/43499776/edit) your post accordingly – MaxU - stand with Ukraine Apr 19 '17 at 15:37
  • well the two columns that i'm joining are both called 'grade_difs', so that is the problem no? is is it stemming from something else – ksalerno Apr 19 '17 at 15:39
  • @ksalerno you need to formulate your question better. Go back and think about your problem. Put together an complete example that demonstrates your problem. Then edit this question or ask a new one that is clear and asks a specific question. – piRSquared Apr 19 '17 at 15:46