1

Is there a way to merge a multidimensional dataframe with a series of different lengths? There are so many ways to combine df's. I've read about joining, concatenating, appending and merging. I don't know which one to use. Also, all have many optional parameters, what makes it even more difficult to understand. Can someone clarify the documentation (https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html), specifically on how to merge a dataframe and series of different lengths?

For example, I would like to merge the following multi-dimensional dataframe,

d = {'Name': ['Kitty', 'Harry', 'Bear', 'Sam', 'Max', 'Hunter', 'Fluffy'], 'Favloc': ['couch', 'windowsill', 'bed', 'basket', 'floor', 'carpet', 'haybale'], 'Pet': ['Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog', 'Hamster']}
df = pd.DataFrame(data=d)
df = df.set_index(['Pet', 'Name'])

print (df)
                 Favloc
Pet     Name           
Cat     Kitty     couch
        Harry   windows
        Bear        bed
        Sam      basket
Dog     Max       floor
        Hunter   carper

with the following Series, s1:

s1 = pd.Series([3,3,1], index=['Cat','Dog','Hamster'])

I would like the result to be:

                      Favloc
Pet     cnts Name           
Cat     3    Kitty     couch
             Harry   windows
             Bear        bed
             Sam      basket
Dog     3    Max       floor
             Hunter   carper
Hamster 1    Fluffy  Haybale

I already tried

result = df.join(s1)

But that throws an error:

Cannot join with no level specified and no overlapping names

I understand that I didn't specify a level, but I don't know how to specify it. Should I say level 1, because I'd like cnts to be on the 1 level index? (with Favloc being level 0? Also, I don't understand 'with no overlapping names', because cat, dog and hamster overlap, right?

I also tried

result = pd.concat([df, s1])

This resulted in a dataframe with NAN in every column where I would like to see the cnts.

Then I tried:

result = pd.merge(df, s1)

And I got: can not merge DataFrame with instance of type

I tried al of them with all sorts of arguments but I don't think it helps to show everything I tried? I think I tried to much, because I don't really understand how merging a multidimensional dataframe and series of different length works.

I've seen related questions, but all with dataframes with just one level, like: How to merge a Series and DataFrame So, how to merge a multidimensional dataframe and series of different lengths?

Hibisceae
  • 33
  • 2
  • 8

1 Answers1

2

You can use DataFrame.join with rename and parameter on, then DataFrame.set_index with DataFrame.reorder_levels:

result = (df.join(s1.rename('cnts'), on='Pet')
           .set_index('cnts', append=True)
           .reorder_levels([0,2,1]))
print (result)
                      Favloc
Pet     cnts Name           
Cat     3    Kitty     couch
             Harry   windows
             Bear        bed
             Sam      basket
Dog     3    Max       floor
             Hunter   carper
Hamster 1    Fluffy  Haybale

Or use Index.map:

idx = df.index.get_level_values('Pet').map(s1.rename('cnts').get)
result = df.set_index(idx, append=True).reorder_levels([0,2,1])
print (result)
                      Favloc
Pet     cnts Name           
Cat     3    Kitty     couch
             Harry   windows
             Bear        bed
             Sam      basket
Dog     3    Max       floor
             Hunter   carper
Hamster 1    Fluffy  Haybale
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for your help. I tried both. The first, DataFrame.join, throws an error: KeyError: 'Pet'. Index.map throws another error: 'Series' object is not callable – Hibisceae Mar 14 '19 at 12:06
  • 1
    @Hibisceae - Change `idx = df.index.get_level_values('Pet').map(s1).rename('cnts')` to `idx = df.index.get_level_values('Pet').map(s1.get).rename('cnts')` – jezrael Mar 14 '19 at 12:07
  • 1
    @Hibisceae - It is problem some oldier pandas version, if use last `0.24+` it working nice with no `.get` too. – jezrael Mar 14 '19 at 12:08
  • 1
    That's it! It threw: 'numpy.ndarray' object has no attribute 'rename', Workaround: `s1 = s1.rename('cnts')` and then idx = df.index.get_level_values('Pet').map(s1.get) result = df.set_index(idx, append=True).reorder_levels([0,2,1]) Problem solved. Thanks! – Hibisceae Mar 14 '19 at 12:15
  • @Hibisceae - yes, it is same like `idx = df.index.get_level_values('Pet').map(s1.rename('cnts').get)`. Thanks, glad to help. Don't forget to accept the answer, if it suits you! :) – jezrael Mar 14 '19 at 12:17