I have two data sets. They are different lengths, and are unordered, but describe the same events. Some events repeat, but on different dates. How do I add h_goals and a_goals data from new_data to my_data based on simultaneous match of three columns = 'dt', 'h_team', 'a_team'?
new_data = {
'a_team' : [
'Arsenal', 'Newcastle', 'Everton', 'Brighton',
'Newcastle', 'Liverpool', 'Brighton'
],
'dt' : [
'2018-01-02', '2018-01-02', '2018-02-03',
'2018-02-04', '2018-02-04', '2018-02-05', '2018-01-02'
],
'h_team' : [
'Man Utd', 'Liverpool', 'Tottenham', 'Man City',
'Liverpool', 'Newcastle', 'Man City',
],
'h_goals': [2, 0, 3, 1, 2, 1, 1],
'a_goals': [0, 0, 1, 2, 4, 1, 0],
}
my_data = {
'dt' : [
'2018-01-02', '2018-01-02', '2018-02-03',
'2018-02-04', '2018-02-05'
],
'h_team': [
'Man City', 'Liverpool', 'Tottenham', 'Man City', 'Newcastle'
],
'a_team': [
'Brighton', 'Newcastle', 'Everton', 'Brighton', 'Liverpool'
],
'result': ['H', 'D', 'H', 'A', 'D']
}
df1 = pd.DataFrame(new_data)
df2 = pd.DataFrame(my_data)
Expected Output:
dt h_team a_team result h_goals a_goals
0 2018-01-02 Man City Brighton H 1 0
1 2018-01-02 Liverpool Newcastle D 0 0
2 2018-02-03 Tottenham Everton H 3 1
3 2018-02-04 Man City Brighton A 1 2
4 2018-02-05 Newcastle Liverpool D 1 1