Pandas replacing less than n consecutive values with neighboring consecutive value

Question

Supposing I have the following DataFrame df

df = pd.DataFrame({
"a" : [8,8,0,8,8,8,8,8,8,8,4,1,4,4,4,4,4,4,4,4,4,4,7,7,4,4,4,4,4,4,4,4,5,5,5,5,5,5,1,1,5,5,5,5,5,5,1,5,1,5,5,5,5]}

i want to normalize my data, if there is consecutive value less than 3 times, changes the value with neighboring consecutive value.

result:   
 df = pd.DataFrame({
        "a" : [8,8,8,8,8,8,8,8,8,8,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5]}

currently i make this work by iterate manually, and i think pandas has special function to do it.

There is no special function for this. You may show your attempt, then we may optimize it for faster operation (if any). Using for loop may become very slow for this for large dataset. — Rahul Vishwakarma, Aug 08 '20 at 13:52
I think the answer here can help you :) https://stackoverflow.com/questions/27626542/counting-consecutive-positive-value-in-python-array — Active_Learner, Aug 08 '20 at 15:02

Terry · Accepted Answer · 2020-08-08T15:21:26.393

This is a little trycky, use diff(), cumsum() and np.size to find the size of the groups. Use mask() to find groups smaller than 3 and replace those with ffill and bfill

s = df.groupby((df['a'].diff() != 0).cumsum()).transform(np.size)
df['a'] = df[['a']].mask(s < 3).ffill().bfill()

#result
[8., 8., 8., 8., 8., 8., 8., 8., 8., 8., 8., 8., 4., 4., 4., 4., 4.,
   4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 5., 5.,
   5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5.,
   5., 5.]

score 1 · Answer 2 · answered Aug 08 '20 at 14:39

Using NumPy will be useful as:

import numpy as np
import pandas as pd

df = pd.DataFrame({"a" : [8,8,0,8,8,8,8,8,8,8,
                          4,1,4,4,4,4,4,4,4,4,
                          4,4,7,7,4,4,4,4,4,4,
                          4,4,5,5,5,5,5,5,1,1,
                          5,5,5,5,5,5,1,5,4,5,
                          5,5,5]})

arr = df.values.reshape(-1)
sub = arr[1:]-arr[:-1]
add2 = sub[1:]+sub[:-1]  
add3 = sub[2:]+sub[:-2]
del2 = np.where((sub[1:]!=0) & (add2*sub[1:]==0))[0]+1
del3 = np.where((sub[2:]!=0) & (add3*sub[2:]==0))[0]+1
arr[del2] = arr[del2-1]
arr[del3] = arr[del3-1]
arr[del3+1] = arr[del3+2]
df = pd.DataFrame({"a" : arr})
print(arr)

'''
Output:
[8 8 8 8 8 8 8 8 8 8 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5
 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5]
'''

Pandas replacing less than n consecutive values with neighboring consecutive value

2 Answers2