Row by row matching of two (1 large and 1 small) data frames using if else conditions and return the matched rows only

Question

I have two dfs: myteam and allplayers.

In myteam df, the names are mentioned:

myteam = pd.DataFrame({
                     "name": ["ryan", "bill", "saka", "Henry","Rooney"], 
                     "name1": ["ryan 112", "Bill Matt Cdevaca", "Bukayo Saka", "Super Henry","Rooney"], 
                     "name2": [np.nan, "XXVaca", "Bukayo", "Thierry","Rooney"], 
                     "name3": ["Ryan", "XXVaca", "Sak", "Thierr","Roon"]})

In allplayers df, we have goals and matches for all the league players.

allplayers = pd.DataFrame({
                     "player_name": ["RYAN 112", "xxVaca", "bukayo", "thierry", "RONALDO", "MESSI"], 
                     "goals": [0, 2, 5, 10, 100, 200], 
                     "matches": [22, 100, 200, 300, 100, 90]})

The name pattern for players does not match in the two dfs.

I want to select one row at a time from myteam df and iterate over "name3", "name1" and "name2" values in that order (if the prior name fails go to next one) to check if the value is equal (case-insensitive) to player_name col value in allplayers df and return the goals and matches from allplayers df and cbind the name value from myteam to the answerdf.

If name3 value is found then no need to check for name1 and so on...

If none of the values in name3, name1, name2 are equal, then return only the myteam player name with NAN in goals and matches columns.

answerdf = pd.DataFrame({
                     "player_name": ["RYAN 112", "xxVaca", "bukayo", "thierry", "Rooney"], 
                     "goals": [0, 2, 5, 10, "NaN"], 
                     "matches": [22, 100, 200, 300, "NaN"],
                    "name_from_myteam_df": ["ryan", "bill", "saka", "Henry", "Rooney"]})

Questions that ask for general guidance regarding a problem approach are typically are not a good fit for this site. Split your task into smaller ones and start working on them. And ask particular questions related to particular issues you encounter. Here's how: [how-to-ask](https://stackoverflow.com/help/how-to-ask) — itprorh66, Jul 23 '23 at 23:28

score 1 · Accepted Answer · answered Jul 23 '23 at 23:24

You can do it using apply, str.lower and iloc:

import pandas as pd
import numpy as np

# Given dataframes
myteam = pd.DataFrame({
    "name": ["ryan", "bill", "saka", "Henry", "Rooney"],
    "name1": ["ryan 112", "Bill Matt Cdevaca", "Bukayo Saka", "Super Henry", "Rooney"],
    "name2": [np.nan, "XXVaca", "Bukayo", "Thierry", "Rooney"],
    "name3": ["Ryan", "XXVaca", "Sak", "Thierr", "Roon"]
})

allplayers = pd.DataFrame({
    "player_name": ["RYAN 112", "xxVaca", "bukayo", "thierry", "RONALDO", "MESSI"],
    "goals": [0, 2, 5, 10, 100, 200],
    "matches": [22, 100, 200, 300, 100, 90]
})

# Convert player_name columns in both dataframes to lowercase for case-insensitive matching
myteam['name'] = myteam['name'].str.lower()
myteam['name1'] = myteam['name1'].str.lower()
myteam['name2'] = myteam['name2'].str.lower()
myteam['name3'] = myteam['name3'].str.lower()

allplayers['player_name'] = allplayers['player_name'].str.lower()

# Function to match names and extract goals and matches
def match_player(row):
    for col in ['name3', 'name1', 'name2']:
        player_name = row[col]
        if player_name in allplayers['player_name'].values:
            player_data = allplayers.loc[allplayers['player_name'] == player_name]
            return pd.Series([player_data['player_name'].values[0], player_data['goals'].values[0], player_data['matches'].values[0]])
    return pd.Series([np.nan, np.nan, np.nan])

# Apply the function row-wise to create the answerdf
answerdf = myteam.apply(match_player, axis=1)

# Concatenate the name_from_myteam_df column from myteam dataframe
answerdf['name_from_myteam_df'] = myteam['name']

# Rename the columns to match the desired output
answerdf.columns = ['player_name', 'goals', 'matches', 'name_from_myteam_df']

# Convert goals and matches columns to numeric type and replace NaN with "NaN" as a string
answerdf['goals'] = pd.to_numeric(answerdf['goals'], errors='coerce').fillna("NaN")
answerdf['matches'] = pd.to_numeric(answerdf['matches'], errors='coerce').fillna("NaN")

print(answerdf)

Output:

  player_name goals matches name_from_myteam_df
0    ryan 112     0      22                ryan
1      xxvaca     2     100                bill
2      bukayo     5     200                saka
3     thierry    10     300               henry
4         NaN   NaN     NaN              rooney

Note: The goals and matches columns in answerdf are of type object (string) because NaN was introduced as a string for missing values. If you intend to perform numerical operations on these columns, you can convert them back to numeric types using pd.to_numeric.

Row by row matching of two (1 large and 1 small) data frames using if else conditions and return the matched rows only

1 Answers1