Improving the efficiency of pandas nested for loops

Question

I have a question about the following data on a multiple-day match where people collected points on different days of the match. I want to sum the points gotten by one person (with a unique key) on that person's first, second, and third day and I have two data frames for this (see frames A and B). The problem is that the code I wrote to obtain the result, takes a long time and is very inefficient. I can't seem to figure out how to apply/map works in the case where two values have to be matched (not only person key but also the day). Can anybody help me out with this? I am using Pandas in Python.

Example:

Dataframe A

Person key	Start day	Day	Points
123	10	10	5
123	10	11	1
888	1	1	6
888	1	2	2
888	1	3	4

Dataframe B

Person key	Day 1	Day 2	Day 3	points day 1	points day 2	points day 3
123	10	11	12	0	0	0
888	1	2	3	0	0	0

What I would like to get:

Person key	Day 1	Day 2	Day 3	points day 1	points day 2	points day 3
123	10	11	12	5	1	0
888	1	2	3	6	2	4

The code I tried:

for i in range (0,len(dfB)):
    for t in range(0,len(dfA)): 
        if dfB['Day 1'].iloc[i] == dfA['Day'].iloc[t] and dfB['Person key'].iloc[i] == dfA['Person key'].iloc[t]:
            dfB['points day 1'].iloc[i] = dfA['Points'].iloc[t]
            continue
        if dfB['Day 2'].iloc[i] == dfA['Day'].iloc[t] and dfB['Person key'].iloc[i] == dfA['Person key'].iloc[t]:
            dfB['points day 2'].iloc[i] = dfA['Points'].iloc[t]
            continue
        if dfB['Day 3'].iloc[i] == dfA['Day'].iloc[t] and dfB['Person key'].iloc[i] == dfA['Person key'].iloc[t]:
            dfB['points day 3'].iloc[i] = dfA['Points'].iloc[t]
            continue

"two values have to be matched" means that you might need to consider [`pd.merge`](https://pandas.pydata.org/docs/reference/api/pandas.merge.html) — aaossa, Mar 08 '22 at 14:03
@aaossa Thank you, but I don't see how pd.merge helps with filling in the columns for points day 1, points day 2. etc. I want to fill the right column based on whether the person key is the same and whether the day in Dataframe A is either day 1, 2 or 3 in Dataframe B. — oppugno, Mar 08 '22 at 14:08
One way of solving it is using - ```df.pivot_table(index='person', columns=['day'], values='points').fillna(0)``` but that is not the format in which you want the new dataframe — Rajarshi Ghosh, Mar 08 '22 at 14:25

not_speshal · Answer 1 · 2022-03-08T15:48:17.130

2

Try:

melt dfA to match the structure of dfB
merge both DataFrames
pivot and rename columns to get the required format

melted = dfB.melt("Person key", ["Day 1", "Day 2", "Day 3"], value_name="Day")
output = melted.merge(dfA, on=["Person key", "Day"], how="left").drop("Start day", axis=1)
output = output.pivot("Person key", "variable", ["Day", "Points"]).fillna(0)
output.columns = output.columns.map(lambda x: x[1] if x[0]=="Day" else " ".join(x))

>>> output
            Day 1  Day 2  Day 3  Points Day 1  Points Day 2  Points Day 3
Person key                                                               
123          10.0   11.0   12.0           5.0           1.0           0.0
888           1.0    2.0    3.0           6.0           2.0           4.0

Inputs:

dfA = pd.DataFrame({'Person key': [123, 123, 888, 888, 888],
                    'Start day': [10, 10, 1, 1, 1],
                    'Day': [10, 11, 1, 2, 3],
                    'Points': [5, 1, 6, 2, 4]}

dfB = pd.DataFrame({'Person key': [123, 888],
                    'Day 1': [10, 1],
                    'Day 2': [11, 2],
                    'Day 3': [12, 3],
                    'points day 1': [0, 0],
                    'points day 2': [0, 0],
                    'points day 3': [0, 0]})

edited Mar 08 '22 at 15:48

answered Mar 08 '22 at 14:17

not_speshal

22,093
2
15
30

I say: yes_speshal – C. Pappy Mar 08 '22 at 14:23
@not_speshal Thanks a lot, I really appreciate it! However when I try exactly the same code I don't get the same output as you... Did you do anything to get the table as output? – oppugno Mar 08 '22 at 15:25
If your table is formatted *exactly* (including column names etc.) as in your post, my code works. – not_speshal Mar 08 '22 at 15:33
I think I am having issues with the .join. When I do that, it gives this error: "TypeError: sequence item 0: expected str instance, int found". So I tried .join(str(x)), and .join(str(x) for x in output)), but that also doesnt lead to a table. I get a dtype object with Index[(tuples)]. Strange! – oppugno Mar 08 '22 at 15:37
What exactly is the difference between what you posted in your question and what you have? – not_speshal Mar 08 '22 at 15:40
@not_speshal to get back on this, looked over it: the difference is that I have to use pivot_table instead of just pivot, as I have some overlapping indexes. I think that is giving the issues. – oppugno Mar 08 '22 at 17:35

Improving the efficiency of pandas nested for loops

1 Answers1

Inputs: