While Loop Alternative in Python

Question

I am working on a huge dataframe and trying to create a new column, based on a condition in another column. Right now, I have a big while-loop and this calculation takes too much time, is there an easier way to do it?

With lambda for example?:

def promo(dataframe, a):  
    i=0
    while i < len(dataframe)-1:
        i=i+1
        if dataframe.iloc[i-1,5] >= a:
            dataframe.iloc[i-1,6] = 1
        else:
            dataframe.iloc[i-1,6] = 0

    return dataframe

it's easier to understand if you provide some explanation of **trying to create a new column, based on a condition in another column** — Sociopath, Sep 11 '18 at 06:56
If you're using loops with dataframe rows, you're usually doing something wrong. Please show example input and output — OneCricketeer, Sep 11 '18 at 06:58
@cricket_007 particularly *indexing* for-loops. If you have to loop over rows, then you should at least use `itertuples`. — juanpa.arrivillaga, Sep 11 '18 at 07:02

score 2 · Accepted Answer · edited Sep 11 '18 at 08:07

2

Don't use loops in pandas, they are slow compared to a vectorized solution - convert boolean mask to integers by astype True, False are converted to 1, 0:

dataframe = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':list('aaabbb'),
                   'F':[5,3,6,9,2,4],
                   'G':[5,3,6,9,2,4]
})

a = 5
dataframe['new'] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
   A  B  C  D  E  F  G  new
0  a  4  7  1  a  5  5    1
1  b  5  8  3  a  3  3    0
2  c  4  9  5  a  6  6    1
3  d  5  4  7  b  9  9    1
4  e  5  2  1  b  2  2    0
5  f  4  3  0  b  4  4    0

If you want to overwrite the 7th column:

a = 5
dataframe.iloc[:,6] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
   A  B  C  D  E  F  G
0  a  4  7  1  a  5  1
1  b  5  8  3  a  3  0
2  c  4  9  5  a  6  1
3  d  5  4  7  b  9  1
4  e  5  2  1  b  2  0
5  f  4  3  0  b  4  0

edited Sep 11 '18 at 08:07

M.T

4,917
4
33
52

answered Sep 11 '18 at 07:02

jezrael

822,522
95
1,334
1,252

@M.T - Thank you, +3 – jezrael Sep 11 '18 at 08:08
I do not understand why you are using iloc[]. In this case I would use: df.loc[:, 5] or df[5]. Can you explain me if there is any difference in performance? – Antonio Andrés Sep 11 '18 at 08:14
@AntonioAndrés - It depends if want seelct `6th column` like in my solution, then need `dataframe.iloc[:,5]`. But if default columns names by RangeIndex then is possible use `df[5]` or `df.loc[:, 5]` – jezrael Sep 11 '18 at 08:16
Ok, .iloc[] is more correct because the column notation is: 1, 2, 3. If the column notation will be: 'A', 'B', 'C'. the answer will use .loc[]. That's right? – Antonio Andrés Sep 11 '18 at 08:18
@AntonioAndrés - It depends what need - if want select by label or select by positions, check [this](https://stackoverflow.com/q/31593201) for difference. – jezrael Sep 11 '18 at 08:20
1

Ok, I understand you, iloc[] is for position also in columns not only in rows. Thanks! – Antonio Andrés Sep 11 '18 at 08:21
Is there any oppurtunity if i do not want to have 1 or 0, instead I need another number e.g. 2? – PV8 Sep 12 '18 at 06:57
1

@PV8 - Then use `dataframe['new'] = np.where(dataframe.iloc[:,5] >= a, 1, 2)` – jezrael Sep 12 '18 at 06:59
thx, does this also work for multiple conditions, if i want to check the column with a and b, and assign another value? – PV8 Sep 13 '18 at 09:43
@PV8 - Yes, then use `dataframe['new'] = np.where((dataframe.iloc[:,5] >= a) & (dataframe.iloc[:,5] >= b), 1, 2)` – jezrael Sep 13 '18 at 10:12
@PV8 - sure, change `dataframe.iloc[:,5]` to `dataframe['colname']` – jezrael Oct 01 '18 at 11:40

While Loop Alternative in Python

1 Answers1

Linked