0

I have a question about the following data on a multiple-day match where people collected points on different days of the match. I want to sum the points gotten by one person (with a unique key) on that person's first, second, and third day and I have two data frames for this (see frames A and B). The problem is that the code I wrote to obtain the result, takes a long time and is very inefficient. I can't seem to figure out how to apply/map works in the case where two values have to be matched (not only person key but also the day). Can anybody help me out with this? I am using Pandas in Python.

Example:

Dataframe A

Person key Start day Day Points
123 10 10 5
123 10 11 1
888 1 1 6
888 1 2 2
888 1 3 4

Dataframe B

Person key Day 1 Day 2 Day 3 points day 1 points day 2 points day 3
123 10 11 12 0 0 0
888 1 2 3 0 0 0

What I would like to get:

Person key Day 1 Day 2 Day 3 points day 1 points day 2 points day 3
123 10 11 12 5 1 0
888 1 2 3 6 2 4

The code I tried:

for i in range (0,len(dfB)):
    for t in range(0,len(dfA)): 
        if dfB['Day 1'].iloc[i] == dfA['Day'].iloc[t] and dfB['Person key'].iloc[i] == dfA['Person key'].iloc[t]:
            dfB['points day 1'].iloc[i] = dfA['Points'].iloc[t]
            continue
        if dfB['Day 2'].iloc[i] == dfA['Day'].iloc[t] and dfB['Person key'].iloc[i] == dfA['Person key'].iloc[t]:
            dfB['points day 2'].iloc[i] = dfA['Points'].iloc[t]
            continue
        if dfB['Day 3'].iloc[i] == dfA['Day'].iloc[t] and dfB['Person key'].iloc[i] == dfA['Person key'].iloc[t]:
            dfB['points day 3'].iloc[i] = dfA['Points'].iloc[t]
            continue
    
codester_09
  • 5,622
  • 2
  • 5
  • 27
oppugno
  • 1
  • 1
  • "two values have to be matched" means that you might need to consider [`pd.merge`](https://pandas.pydata.org/docs/reference/api/pandas.merge.html) – aaossa Mar 08 '22 at 14:03
  • @aaossa Thank you, but I don't see how pd.merge helps with filling in the columns for points day 1, points day 2. etc. I want to fill the right column based on whether the person key is the same and whether the day in Dataframe A is either day 1, 2 or 3 in Dataframe B. – oppugno Mar 08 '22 at 14:08
  • One way of solving it is using - ```df.pivot_table(index='person', columns=['day'], values='points').fillna(0)``` but that is not the format in which you want the new dataframe – Rajarshi Ghosh Mar 08 '22 at 14:25

1 Answers1

2

Try:

  1. melt dfA to match the structure of dfB
  2. merge both DataFrames
  3. pivot and rename columns to get the required format
melted = dfB.melt("Person key", ["Day 1", "Day 2", "Day 3"], value_name="Day")
output = melted.merge(dfA, on=["Person key", "Day"], how="left").drop("Start day", axis=1)
output = output.pivot("Person key", "variable", ["Day", "Points"]).fillna(0)
output.columns = output.columns.map(lambda x: x[1] if x[0]=="Day" else " ".join(x))

>>> output
            Day 1  Day 2  Day 3  Points Day 1  Points Day 2  Points Day 3
Person key                                                               
123          10.0   11.0   12.0           5.0           1.0           0.0
888           1.0    2.0    3.0           6.0           2.0           4.0
Inputs:
dfA = pd.DataFrame({'Person key': [123, 123, 888, 888, 888],
                    'Start day': [10, 10, 1, 1, 1],
                    'Day': [10, 11, 1, 2, 3],
                    'Points': [5, 1, 6, 2, 4]}

dfB = pd.DataFrame({'Person key': [123, 888],
                    'Day 1': [10, 1],
                    'Day 2': [11, 2],
                    'Day 3': [12, 3],
                    'points day 1': [0, 0],
                    'points day 2': [0, 0],
                    'points day 3': [0, 0]})
not_speshal
  • 22,093
  • 2
  • 15
  • 30
  • I say: yes_speshal – C. Pappy Mar 08 '22 at 14:23
  • @not_speshal Thanks a lot, I really appreciate it! However when I try exactly the same code I don't get the same output as you... Did you do anything to get the table as output? – oppugno Mar 08 '22 at 15:25
  • If your table is formatted *exactly* (including column names etc.) as in your post, my code works. – not_speshal Mar 08 '22 at 15:33
  • I think I am having issues with the .join. When I do that, it gives this error: "TypeError: sequence item 0: expected str instance, int found". So I tried .join(str(x)), and .join(str(x) for x in output)), but that also doesnt lead to a table. I get a dtype object with Index[(tuples)]. Strange! – oppugno Mar 08 '22 at 15:37
  • What exactly is the difference between what you posted in your question and what you have? – not_speshal Mar 08 '22 at 15:40
  • @not_speshal to get back on this, looked over it: the difference is that I have to use pivot_table instead of just pivot, as I have some overlapping indexes. I think that is giving the issues. – oppugno Mar 08 '22 at 17:35