Pandas multi-column match and then backward shift to find value

Question

I would really appreciate your help with this. I'm a Pandas python noob and have been thrown into the deepend with this problem.

I searched through around 100 different queries on this website and cannot find something befitting. The closest I got was applying boolean masks.

Please click on the text below to find my dataframe.

I would like to run a query on the dataset to find the previous row where the 'AwayTeam' string is found in the 'HomeTeam' column -> I would then like to pull in the value of 'home_form' for that matching incidence as an additional column

Date HomeTeam AwayTeam home_form away_form new_column
25/08/2019 Strasbourg Rennes 1.0 3.0 Nan (Row 25, just an example)
01/09/2019 Rennes Nice 3.0 3.0 3.0 (Row 37, just an example)

I want to pull in the previous 'away_form' value for the last row where a HomeTeam appeared in the AwayTeam column

Can you provide an example of a specific query you are trying to solve? It would also help to have an expected outcome, so we can reverse-engineer what you are looking for. — user_stack_overflow, Oct 21 '19 at 16:32

score 0 · Accepted Answer · answered Oct 21 '19 at 18:24

This is not a complete solution, but I think I found a way to help you make some progress.

Here the steps:

Create a sample dataframe, just for illustration
Convert the 'HomeTeam' column into a list … this is the target column.
Create an empty list to store the results of searching 'HomeTeam' column
Loop through the teams in the 'AwayTeam' column
Use Python's list.index() method to return the index of the match … but use a try-except just in case you don't find a match.
Store the result into list
When finished with the for-loop, add the list as a new column in the pandas dataframe.

import pandas as pd
import numpy as np

# create sample dataframe
df = pd.DataFrame({
    'Date': ['2019-08-18', '2019-08-25'], 
    'HomeTeam': ['Rennes', 'Strasbourg'],
    'AwayTeam': ['Paris SG', 'Rennes'],
    'home_form': [np.NaN, 1.0],
    'away_form': [np.NaN, 3.0],
})

# convert your 'HomeTeam' column into a Python list
list_HomeTeam = list(df['HomeTeam'])
print(list_HomeTeam)

# create an empty list to capture the index position of matches in 'HomeTeam'
list_results_in_home = []

# loop through each team in the 'AwayTeam column'
for each_team in df['AwayTeam']:

    # if you find a match in the list, store index as a result
    try:
        result = list_HomeTeam.index(each_team)

    # if you don't find a match, store a string
    except:
        result = 'team not in list'

    # add the result to the list that is capturing the index position in 'HomeTeam'
    list_results_in_home.append(result)

print(list_index_home)

# add column to dataframe with the index position
df['index_match_in_HomeTeam'] = list_results_in_home

However, I have the HomeTeam and AwayTeam appearing multiple times in the dataframe and so what I would really like is to get a match on the LAST index, using the same iterative method above. Any ideas? — Raz, Oct 22 '19 at 08:10
Use list.rindex(). https://stackoverflow.com/questions/6890170/how-to-find-the-last-occurrence-of-an-item-in-a-python-list — user_stack_overflow, Oct 22 '19 at 13:26

Pandas multi-column match and then backward shift to find value

1 Answers1