5

Disclaimer: This might be possible duplicate but I cannot find the exact solution. Please feel free to mark this question as duplicate and provide link to duplicate question in comments.

I am still learning python dataframe operations and this possibly has a very simple solution which I am not able to figure out.

I have a python dataframe with a single columns. Now I want to change value of each row to value of previous row if certain conditions are satisfied. I have created a loop solution to implement this but I was hoping for a more efficient solution.

Creation of initial data:

import numpy as np
import pandas as pd

data = np.random.randint(5,30,size=20)
df = pd.DataFrame(data, columns=['random_numbers'])

print(df)

    random_numbers
0                6
1               24
2               29
3               18
4               22
5               17
6               12
7                7
8                6
9               27
10              29
11              13
12              23
13               6
14              25
15              24
16              16
17              15
18              25
19              19

Now lets assume two condition are 1) value less than 10 and 2) value more than 20. In any of these cases, set row value to previous row value. This has been implement in loop format as follows:

for index,row in df.iterrows():
    if index == 0:
        continue;
    if(row.random_numbers<10):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']
    if(row.random_numbers>20):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']

    random_numbers
0                6
1                6
2                6
3               18
4               18
5               17
6               12
7               12
8               12
9               12
10              12
11              13
12              13
13              13
14              13
15              13
16              16
17              15
18              15
19              19

Please suggest a more efficient way to implement this logic as I am using large number of rows.

lonstud
  • 525
  • 2
  • 19

2 Answers2

5

You can replace the values less than 10 and values more than 20 with NaN then use pandas.DataFrame.ffill() to fill nan with previous row value.

mask = (df['random_numbers'] < 10) | (df['random_numbers'] > 20)

# Since you escape with `if index == 0:`
mask[df.index[0]] = False

df.loc[mask, 'random_numbers'] = np.nan

df['random_numbers'].ffill(inplace=True)
# Original

    random_numbers
0                7
1               28
2                8
3               14
4               12
5               20
6               21
7               11
8               16
9               27
10              19
11              23
12              18
13               5
14               6
15              11
16               6
17               8
18              17
19               8

# After replaced

    random_numbers
0              7.0
1              7.0
2              7.0
3             14.0
4             12.0
5             20.0
6             20.0
7             11.0
8             16.0
9             16.0
10            19.0
11            19.0
12            18.0
13            18.0
14            18.0
15            11.0
16            11.0
17            11.0
18            17.0
19            17.0
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • 2
    nice one, only change I would recommend is ditching the `inplace=True` and using variable assignment [see this thread](https://stackoverflow.com/questions/45570984/in-pandas-is-inplace-true-considered-harmful-or-not) – Umar.H May 07 '21 at 05:16
2

We can also do it in a simpler way by using .mask() together with .ffill() and slicing on [1:] as follows:

df['random_numbers'][1:] = df['random_numbers'][1:].mask((df['random_numbers'] < 10) | (df['random_numbers'] > 20))

df['random_numbers'] = df['random_numbers'].ffill(downcast='infer')

.mask() tests for the condition and replace with NaN when the condition is true (default to replace with NaN if the parameter other= is not supplied). Retains the original values for rows with condition not met.

Note that the resulting numbers are maintained as integer instead of transformed unexpectedly to float type by supplying the downcast='infer' in the call to .ffill().

We use [1:] on the first line to ensure the data on row 0 is untouched without transformation.

# Original data:  (reusing your sample data)

    random_numbers
0                6
1               24
2               29
3               18
4               22
5               17
6               12
7                7
8                6
9               27
10              29
11              13
12              23
13               6
14              25
15              24
16              16
17              15
18              25
19              19


# After transposition:

    random_numbers
0                6
1                6
2                6
3               18
4               18
5               17
6               12
7               12
8               12
9               12
10              12
11              13
12              13
13              13
14              13
15              13
16              16
17              15
18              15
19              19

SeaBean
  • 22,547
  • 3
  • 13
  • 25