0

I would really appreciate your help with this. I'm a Pandas python noob and have been thrown into the deepend with this problem.

I searched through around 100 different queries on this website and cannot find something befitting. The closest I got was applying boolean masks.

Please click on the text below to find my dataframe.

I would like to run a query on the dataset to find the previous row where the 'AwayTeam' string is found in the 'HomeTeam' column -> I would then like to pull in the value of 'home_form' for that matching incidence as an additional column

Date HomeTeam AwayTeam home_form away_form new_column
25/08/2019 Strasbourg Rennes 1.0 3.0 Nan (Row 25, just an example)
01/09/2019 Rennes Nice 3.0 3.0 3.0 (Row 37, just an example)

I want to pull in the previous 'away_form' value for the last row where a HomeTeam appeared in the AwayTeam column

Raz
  • 3
  • 2

1 Answers1

0

This is not a complete solution, but I think I found a way to help you make some progress.

Here the steps:

  1. Create a sample dataframe, just for illustration
  2. Convert the 'HomeTeam' column into a list … this is the target column.

  3. Create an empty list to store the results of searching 'HomeTeam' column

  4. Loop through the teams in the 'AwayTeam' column

  5. Use Python's list.index() method to return the index of the match … but use a try-except just in case you don't find a match.

  6. Store the result into list

  7. When finished with the for-loop, add the list as a new column in the pandas dataframe.

import pandas as pd
import numpy as np

# create sample dataframe
df = pd.DataFrame({
    'Date': ['2019-08-18', '2019-08-25'], 
    'HomeTeam': ['Rennes', 'Strasbourg'],
    'AwayTeam': ['Paris SG', 'Rennes'],
    'home_form': [np.NaN, 1.0],
    'away_form': [np.NaN, 3.0],
})

# convert your 'HomeTeam' column into a Python list
list_HomeTeam = list(df['HomeTeam'])
print(list_HomeTeam)

# create an empty list to capture the index position of matches in 'HomeTeam'
list_results_in_home = []

# loop through each team in the 'AwayTeam column'
for each_team in df['AwayTeam']:

    # if you find a match in the list, store index as a result
    try:
        result = list_HomeTeam.index(each_team)

    # if you don't find a match, store a string
    except:
        result = 'team not in list'

    # add the result to the list that is capturing the index position in 'HomeTeam'
    list_results_in_home.append(result)

print(list_index_home)

# add column to dataframe with the index position
df['index_match_in_HomeTeam'] = list_results_in_home

  • However, I have the HomeTeam and AwayTeam appearing multiple times in the dataframe and so what I would really like is to get a match on the LAST index, using the same iterative method above. Any ideas? – Raz Oct 22 '19 at 08:10
  • Use list.rindex(). https://stackoverflow.com/questions/6890170/how-to-find-the-last-occurrence-of-an-item-in-a-python-list – user_stack_overflow Oct 22 '19 at 13:26