0

I am attempting to join two json columns using Pandas.

import pandas as pd

students_df = pd.DataFrame([['Jay',16,'Soccer'],['Jay',26,'Soccer'],
                   ['Jack',19, 'FootBall'],
                   ['Dorsey',19, 'Dining'],
                   ['Mark',18, 'Swimming']],
                  columns = ['Name','Age', 'Sport'])
                  
courses_df = pd.DataFrame([['Jay','MS'],
                   ['Jay','Music'],
                   ['Dorsey','Music'],
                   ['Dorsey', 'Piano'],
                   ['Mark','MS']],
                  columns = ['Name','Course'])
                  
students_df_json = students_df.groupby(['Name']).apply(lambda x: x.drop(columns='Name').to_json(orient='records')).reset_index(name="students_json")          
courses_df_json = courses_df.groupby(['Name']).apply(lambda x: x.to_json(orient='records')).reset_index(name="courses_json")

I want to join the two dataframes and then concatenate the two json columns students_json and courses_json

pd.merge(students_df_json , courses_df_json, on="Name")

My try

pd.merge(students_df_json, courses_df_json, on="Name").apply(        
     lambda x: x["students_json"] + x["courses_json"]                 
 )  

                         

But this results in a key error

KeyError: 'students_json'

Also I have another question what is the variable x referring to in the lambda ?

jhon.smith
  • 1,963
  • 6
  • 30
  • 56

2 Answers2

1

Usually the lambda x refers to the function input variable x. In your case when you apply the function on the dataframe it is by default on the axis=0 which is row wise values.

For your requirement I believe you need to apply through axis=1 for column wise values as mentioned below.

pd.merge(students_df_json, courses_df_json, on="Name").apply(        
     lambda x: x["students_json"] + x["courses_json"], axis=1
 )
Abhi
  • 1,080
  • 1
  • 7
  • 21
  • Hi is there a way to annotate the lambda x: as a data frame so intellisense can help me?For example in python we can annotate a variable name:str – jhon.smith Mar 04 '22 at 06:04
  • In lambda x : x is not a dataframe type, it is pandas series instead and not sure how to annotate non native datatypes of Python in intellisense – Abhi Mar 04 '22 at 06:08
1

For vectorized join dont use apply, only use + with Series:

df = pd.merge(students_df_json, courses_df_json, on="Name", how='left')
s = df["students_json"] + df["courses_json"]                 
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252