0

I have two data sets. They are different lengths, and are unordered, but describe the same events. Some events repeat, but on different dates. How do I add h_goals and a_goals data from new_data to my_data based on simultaneous match of three columns = 'dt', 'h_team', 'a_team'?

new_data = {
    'a_team' : [
        'Arsenal', 'Newcastle', 'Everton', 'Brighton',
        'Newcastle', 'Liverpool', 'Brighton'
    ],
    'dt'     : [
        '2018-01-02', '2018-01-02', '2018-02-03',
        '2018-02-04', '2018-02-04', '2018-02-05', '2018-01-02'
    ],
    'h_team' : [
        'Man Utd', 'Liverpool', 'Tottenham', 'Man City',
        'Liverpool', 'Newcastle', 'Man City',
    ],
    'h_goals': [2, 0, 3, 1, 2, 1, 1],
    'a_goals': [0, 0, 1, 2, 4, 1, 0],
}

my_data = {
    'dt'    : [
        '2018-01-02', '2018-01-02', '2018-02-03',
        '2018-02-04', '2018-02-05'
    ],
    'h_team': [
        'Man City', 'Liverpool', 'Tottenham', 'Man City', 'Newcastle'
    ],
    'a_team': [
        'Brighton', 'Newcastle', 'Everton', 'Brighton', 'Liverpool'
    ],
    'result': ['H', 'D', 'H', 'A', 'D']
}

df1 = pd.DataFrame(new_data)
df2 = pd.DataFrame(my_data)
Expected Output:
            dt     h_team     a_team result  h_goals  a_goals
 0  2018-01-02   Man City   Brighton      H        1        0
 1  2018-01-02  Liverpool  Newcastle      D        0        0
 2  2018-02-03  Tottenham    Everton      H        3        1
 3  2018-02-04   Man City   Brighton      A        1        2
 4  2018-02-05  Newcastle  Liverpool      D        1        1
  • 1
    solution is `df2.merge(df1, how='inner')` – moys Sep 14 '19 at 08:06
  • 1
    as marked by @jezrael , this is a clear duplicate. I put the answer only in the interest of time. Please go through the link pasted so that you are clear how it works. – moys Sep 14 '19 at 08:10

0 Answers0