apply vs nested for loops

Question

I'm trying to build a dataframe in python that is filled with 1s and 0s, depending on the number in one column:

Date        Hour
2005-01-01  1
2005-01-01  2
2005-01-01  3
2005-01-01  4

I want to make new columns based on the number in "Hour", and fill each column with a 1 if that row is equal to the value in "Hour", or 0 if not.

Date        Hour HE1 HE2 HE3 HE4
2005-01-01  1    1   0   0   0
2005-01-01  2    0   1   0   0
2005-01-01  3    0   0   1   0
2005-01-01  4    0   0   0   1

I can do it with this code, but it takes a long time:

for x in range(1,5):
    _HE = 'HE' + str(x)
    for i in load.index:
        load.at[i, _HE] = 1 if load.at[i,'Hour']==x else 0

I feel like this is a great application (no pun intended) for .apply(), but I can't get it to work right.

How would you speed this up?

`.apply`, in general, is not faster than a for-loop. `.apply` is a Python interpreter level [for-loop under the hood](https://stackoverflow.com/questions/38938318/why-apply-sometimes-isnt-faster-than-for-loop-in-pandas-dataframe/38938507#38938507) — juanpa.arrivillaga, Sep 03 '18 at 06:02

jezrael · Accepted Answer · 2018-09-03T06:04:19.897

In pandas loops are not recommended because slow if exist some vectorized solution.

Notice: In function apply are loops under the hood too.

So use pandas.get_dummies and DataFrame.add_prefix and join for add to original df:

df = df.join(pd.get_dummies(df['Hour'].astype(str)).add_prefix('HE'))
print (df)
         Date  Hour  HE1  HE2  HE3  HE4
0  2005-01-01     1    1    0    0    0
1  2005-01-01     2    0    1    0    0
2  2005-01-01     3    0    0    1    0
3  2005-01-01     4    0    0    0    1

Similar function have different performance:

df = pd.concat([df] * 1000, ignore_index=True)

In [62]: %timeit df.join(pd.get_dummies(df['Hour'].astype(str)).add_prefix('HE'))
3.54 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#U9-Forward solution
In [63]: %timeit df.join(df['Hour'].astype(str).str.get_dummies().add_prefix('HE'))
61.6 ms ± 297 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

score 3 · Answer 2 · answered Sep 03 '18 at 06:10

`pandas.factorize` and array slice assignment

j, h = pd.factorize(df.Hour)
i = np.arange(len(df))

b = np.zeros((len(df), len(h)), dtype=h.dtype)
b[i, j] = 1

df.join(pd.DataFrame(b, df.index, h).add_prefix('HE'))

         Date  Hour  HE1  HE2  HE3  HE4
0  2005-01-01     1    1    0    0    0
1  2005-01-01     2    0    1    0    0
2  2005-01-01     3    0    0    1    0
3  2005-01-01     4    0    0    0    1

U13-Forward · Answer 3 · 2018-09-03T06:10:25.623

0

Even tho it's really similar to @jezrael's answer but, this is also much better, (it's just using .str accessor for get_dummies:

print(df.join(df['Hour'].astype(str).str.get_dummies().add_prefix('HE')))

Output:

         Date  Hour  HE1  HE2  HE3  HE4
0  2005-01-01     1    1    0    0    0
1  2005-01-01     2    0    1    0    0
2  2005-01-01     3    0    0    1    0
3  2005-01-01     4    0    0    0    1

edited Sep 03 '18 at 06:10

answered Sep 03 '18 at 06:04

U13-Forward

69,221
14
89
114

`this is also much better` ? Why? – jezrael Sep 03 '18 at 06:05
@jezrael I mean just better than his for loop and apply – U13-Forward Sep 03 '18 at 06:06
2

Then absolutely agree ;) But slowier as `pd.get_dummies` – jezrael Sep 03 '18 at 06:07
1

@jezrael That's true but at least better then his – U13-Forward Sep 03 '18 at 06:08
1

Loop and `apply` are equivalent. I didn't downvote, btw – juanpa.arrivillaga Sep 03 '18 at 06:10
@juanpa.arrivillaga That's True – U13-Forward Sep 03 '18 at 06:10
@jezrael Yeah, did you downvote, Btw wow you got 65 rep on the top – U13-Forward Sep 03 '18 at 06:12
1

@U9-Forward - No, I dont downvote, check printsreen above. If downvoting it is orange colored. – jezrael Sep 03 '18 at 06:13
@jezrael That's true – U13-Forward Sep 03 '18 at 06:13
1

I downvoted. If your answer added some benefit to jez's I could understand the copy/paste, but using the string accessor is strictly worse here (and in most cases), so I don't think this answer is useful. But since it does get the desired output, dv removed – user3483203 Sep 03 '18 at 06:13
@user3483203 I understand but i actually didn't but yeah it's too similar tho – U13-Forward Sep 03 '18 at 06:14

apply vs nested for loops

3 Answers3

pandas.factorize and array slice assignment

`pandas.factorize` and array slice assignment