0

I am trying to compare information into DataFrame. For example I've two bases like this :

data1={'name':['Paul','John','Mike','Sarah','Joe'],'class':[6,5,6,4,6],'sport':['tennis','baseball','basket','basket','tennis']}
base1 = pandas.DataFrame(data=data1)
data2={'name':['Paul','John','Lena','Sarah','Nicolas'],'class':[5,4,6,3,5],'sport':['tennis','baseball','basket','basket','tennis']}
base2 = pandas.DataFrame(data=data2)

I want to retrieve who is still present at next year (e.g. here Paul, John) and even set a condition (e.g who was starting at class 6 followed at class 5, ...)

Do I need to parse on base1 one by one entries to test if it's present in base 2 at next year ? Or is there a more efficient solution ? Best regards and many thanks.

CedM
  • 91
  • 1
  • 8

1 Answers1

0

you may try the following code (see post: pandas: merge (join) two data frames on multiple columns) import pandas as pd

data1={'name':['Paul','John','Mike','Sarah','Joe'],'class':[6,5,6,4,6],'sport':['tennis','baseball','basket','basket','tennis']}
base1 = pd.DataFrame(data=data1)
data2={'name':['Paul','John','Lena','Sarah','Nicolas'],'class':[5,4,6,3,5],'sport':['tennis','baseball','basket','basket','tennis']}
base2 = pd.DataFrame(data=data2)


new_df = pd.merge(base2, base1,  how='left', left_on=['name','sport'], right_on = ['name','sport'])

The result of data1, data2 and new_df will be (Paul, John and Sarah progress to the next year), you can choose the columns which you are interested and put in the new dataframe new_df and continue the filter based on the new_df to get the desired result. enter image description here

XYZ
  • 352
  • 5
  • 19
  • How to count in this merged DataFrame how many have passed from class_x to class_y (class 6 to 5 etc.) ? – CedM Jan 20 '21 at 14:04
  • @CedM, you can simply filter by what you want. e.g. `profrom5To6 = new_df[(new_df['class_y'].notnull()) & (new_df['class_x'] == 5)]` to get the entries who progress from 5 to 6. – XYZ Jan 20 '21 at 14:29