1

I am trying to perform a nested loop thorugh a dataframe and I am really really new in using python. Somehow looking through the google I found many examples but the final one which I need. I used iterrows to loop over the dataframe and index on the date using only data which has the same date. That works. Now I want the nested loop but don't know how that works with iterrows? The code looks like the folloiwng:

import pandas as pd

df = pd.read_csv('C:/Files_Employees.csv', encoding='cp1252', sep=';', index_col=0).dropna()

for current_date in df.index.unique():
    print('calculating date: ' +str(current_date))

    for index, row in df.iterrows():
        if index == current_date:
            print(row['Person']) 

I did it via a nested loop but here I am not sure how i could do the indexing as showed above and somehow the expected results are wrong. The code looks like the following:

import pandas as pd

df = pd.read_csv('C:/Files_Employees.csv', encoding='cp1252', sep=';', index_col=0).dropna()

df2 = pd.DataFrame([])

for i in range(0, len(df)):
        for j in range(i+1, len(df)):   

            if df.iloc[i]['Working Group'] == df.iloc[j]['Working Group']:

                working_hours = df.iloc[i]['Working Hours'] + df.iloc[j]['Working Hours']

                print(df.iloc[i]['Working Group'], working_hours)

If an example is needed I can include one.

The example file looks like the following:

working_date    Working Group   Person  Working Hours   Country
2017-07-14      1   Mike    59  USA
2017-07-14      2   Molly   60  USA
2017-07-14      3   Dennis  45  USA
2017-07-14      4   Pablo   45  USA
2017-07-14      1   Jeff    42  USA
2017-07-14      2   Emily   55  USA
2017-07-14      3   Sophia  46  USA
2017-07-14      4   Alice   41  USA
2017-07-14      1   Ethan   57  USA
2017-07-14      2   Alexander   59  USA
2017-07-14      3   Edward  41  USA
2017-07-14      4   Daniel  46  USA
2017-07-15      1   Mike    59  USA
2017-07-15      2   Molly   59  USA
2017-07-15      3   Dennis  61  USA
2017-07-15      4   Pablo   58  USA
2017-07-15      1   Jeff    58  USA
2017-07-15      2   Emily   51  USA
2017-07-15      3   Sophia  65  USA
2017-07-15      4   Alice   53  USA
2017-07-15      1   Ethan   49  USA
2017-07-15      2   Alexander   61  USA
2017-07-15      3   Edward  56  USA
2017-07-15      4   Daniel  65  USA

The final outpout should be the like the following, which summs in the nested loop every working group together, e.g. Working_Group one for working_date 2017-07-14 is 59+42+57 = 158:

working_date    Working Group   Working Hours   Country
2017-07-14      1               158             USA
2017-07-14      2               174             USA
2017-07-14      3               132             USA
2017-07-14      4               132             USA
2017-07-15      1               166             USA
2017-07-15      2               171             USA
2017-07-15      3               182             USA
2017-07-15      4               176             USA
NewNY1990
  • 107
  • 7
  • 2
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Sep 03 '18 at 09:03
  • 2
    You should not be seeking to use python loops on dataframes. The point of pandas/numpy is to try and vectorize operations to get a speed-up, sometimes orders of magnitude so – roganjosh Sep 03 '18 at 09:05
  • @jezrael, thanks for the answer. I extended the example. Hope it is clearer. – NewNY1990 Sep 03 '18 at 12:47
  • @roganjosh, thanks for the answer. I extended my example. Please show me how the best way would be getting the results which I expect as shown in my example – NewNY1990 Sep 03 '18 at 12:48
  • So use `df = df.groupby(['working_date','Working Group', 'Country'], as_index=False)['Working Hours'].sum()` – jezrael Sep 03 '18 at 12:51
  • @jezrael, Thanks a lot. That is simple :). With as_index=true I can index also based on the date. But one more question, how can I write my output out as I showed it above? – NewNY1990 Sep 03 '18 at 12:58
  • @jezrael, no why are you asking? – NewNY1990 Sep 03 '18 at 13:02
  • @NewNY1990 - ` write my output out` means to file? or `res = df.groupby(['working_date','Working Group', 'Country'], as_index=False)['Working Hours'].sum()` and then `print (res)` or to file `res.to_file('test.csv', index=False)` ? – jezrael Sep 03 '18 at 13:04
  • yes this is what I meant to wrote it out in an csv file. This is what you showed. thanks. – NewNY1990 Sep 03 '18 at 13:06
  • @jezrael, Can i also group non equal columns? – NewNY1990 Sep 03 '18 at 13:41

1 Answers1

1

With Pandas, you should use vectorised operations. Here you can simply use GroupBy + sum:

res = df.groupby(['working_date', 'WorkingGroup', 'Country']).sum().reset_index()
#alternative
res = (df.groupby(['working_date','Working Group', 'Country'], as_index=False)
         ['Working Hours'].sum())
print(res)

  working_date  WorkingGroup Country  WorkingHours
0   2017-07-14             1     USA           158
1   2017-07-14             2     USA           174
2   2017-07-14             3     USA           132
3   2017-07-14             4     USA           132
4   2017-07-15             1     USA           166
5   2017-07-15             2     USA           171
6   2017-07-15             3     USA           182
7   2017-07-15             4     USA           176
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks a lot. This is the way i was looking for. The second alternative resets the index but does not show the dates anymore. – NewNY1990 Sep 03 '18 at 13:02