0

I import the data from an excel file. But the format of merged cells in excel file does not match in python. Therefore, I have to modify the data in python.

for example: the data I import in python looks like

0   aa
1   NaN
2   NaN
3   NaN
4   b
5   NaN
6   NaN
7   NaN
8   NaN
9   ccc
10  NaN
11  NaN
12  NaN
13  dd
14  NaN
15  NaN
16  NaN

the result I want is:

0   aa
1   aa
2   aa
3   aa
4   b
5   b
6   b
7   b
8   b
9   ccc
10  ccc
11  ccc
12  ccc
13  dd
14  dd
15  dd
16  dd

I tried to use for loop to fix the problem. But it took lots of time and I have a huge dataset. I do not know if there is a faster way to do it.

Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
Waynexu
  • 57
  • 1
  • 7

3 Answers3

1

Looks like .fillna() is your friend – quoting the documentation::

We can also propagate non-null values forward or backward.

>>> df
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4
>>> df.fillna(method='ffill')
    A   B   C   D
0   NaN 2.0 NaN 0
1   3.0 4.0 NaN 1
2   3.0 4.0 NaN 5
3   3.0 3.0 NaN 4
AKX
  • 152,115
  • 15
  • 115
  • 172
0

This is exactly the use of the .fillna() function in pandas

Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
0

You can get your desired result with the help of apply AND fillna methods :-

import pandas as pd
import numpy as np

df = pd.DataFrame(data = {'A':['a', np.nan, np.nan, 'b', np.nan]})

l = []
def change(value): 
    if value == "bhale":
        value = l[-1]
        return value
    else:        
        l.append(value)
        return value

# First converting NaN values into any string value like `bhale` here
df['A'] = df['A'].fillna('bhale')  
df["A"] = df['A'].apply(change)   # Using apply method.
df

I hope it may help you.

Rahul charan
  • 765
  • 7
  • 15