33

I want to fill empty cells with with previous row value if they start with number. For example, I have

    Text    Text    
    30      Text    Text    
            Text    Text    
            Text    Text    
    31      Text    Text
    Text    Text    
    31      Text    Text    
            Text    Text    
            Text    Text    
    32      Text    Text
    Text    Text    
            Text    Text    
            Text    Text    
            Text    Text    
            Text    Text

I however, want to have

Text    Text    
30      Text    Text    
30      Text    Text    
30      Text    Text    
31      Text    Text
Text    Text    
31      Text    Text    
31      Text    Text    
31      Text    Text    
32      Text    Text
Text    Text    
        Text    Text    
        Text    Text    
        Text    Text    
        Text    Text

I tried to reach this by using this code:

data = pd.read_csv('DATA.csv',sep='\t', dtype=object, error_bad_lines=False)
data = data.fillna(method='ffill', inplace=True)
print(data)

but it did not work.

Is there anyway to do this?

i2_
  • 665
  • 2
  • 7
  • 14
  • What are the values of the "empty" cells? `fillna` only fills cells whose value is `NaN`. To use `fillna`, first replace "" with `numpy.nan`. – DYZ Dec 18 '16 at 20:01
  • Thank you. I changed "empty" values to NaN with regex. But it did not work. – i2_ Dec 18 '16 at 21:13

3 Answers3

70

First, replace your empty cells with NaNs:

df[df[0]==""] = np.NaN

Now, Use ffill():

df.fillna(method='ffill')
#       0
#0  Text
#1    30
#2    30
#3    30
#4    31
#5  Text
#6    31
#7    31
#8    31
#9    32
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • 6
    Does it work with string values? Because I'm trying to fill my empty cells with previous row value that is a string, and it still returns `NaN`. – joasa Sep 19 '17 at 09:13
  • 1
    Is it correct to say df[df["column_name"]==""] = np.NaN ? – The Dan Nov 18 '20 at 16:29
  • Not sure that would work as I'm on working on a similar issue. ffill would fill everything. – alexmathios Jul 21 '21 at 21:26
  • 4
    @joasa setting the Flag "inplace" allowed me to do the same with string values. Example: ```frame[column_name].fillna(method='ffill', inplace=True)``` – Pm740 Aug 26 '21 at 13:47
  • I think it should be `np.nan` rathar than `np.NaN` in the first line. For some reason my code treated `np.Nan` as a string and `np.nan` as true nan. – DavidWalker Nov 10 '21 at 10:37
  • Could anyone explain what the parts of the `df[df[0]==""] = np.NaN` statement? `df` is obviously the `DataFrame` and `np.NaN` is the special value "Not a Number". But the rest seems cryptic. Is `0` the column? – Anoduck Aug 22 '23 at 02:40
13

I think you can first get NaN instead whitespaces:

df.Text = df.Text[df.Text.str.strip() != '']

print (df)
    Text Text.1      
0     30   Text  Text
1    NaN   Text  Text
2    NaN   Text  Text
3     31   Text  Text
4   Text   Text   NaN
5     31   Text  Text
6    NaN   Text  Text
7    NaN   Text  Text
8     32   Text  Text
9   Text   Text   NaN
10   NaN   Text  Text
11   NaN   Text  Text
12   NaN   Text  Text
13   NaN   Text  Text

Then use ffill (same as fillna with parameter ffill), get to_numeric for where for replace NaN if not numeric forward filling NaN, last replace NaN by empty string by fillna:

orig = df.Text.copy()
df.Text = df.Text.ffill()
mask1 = pd.to_numeric(df.Text, errors='coerce')
df.Text = df.Text.where(mask1, orig).fillna('')
print (df)
    Text Text.1      
0     30   Text  Text
1     30   Text  Text
2     30   Text  Text
3     31   Text  Text
4   Text   Text   NaN
5     31   Text  Text
6     31   Text  Text
7     31   Text  Text
8     32   Text  Text
9   Text   Text   NaN
10         Text  Text
11         Text  Text
12         Text  Text
13         Text  Text
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thank you so much. However when I run your code I get this: line 2672, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'Text' I could not understand what .Text means. (I wrote Text but actually there are some string values). – i2_ Dec 18 '16 at 21:13
  • Text is column name, try use `df.col1` or `df['col 1']` instead. – jezrael Dec 18 '16 at 21:23
3

In general, if you want to fill empty cells with the previous row value, you can just use a recursive function like:

def same_as_upper(col:pd.Series)-> pd.Series:
    '''
    Recursively fill NaN rows with the previous value
    '''
    if any(pd.Series(col).isna()):
        col=pd.Series(np.where(col.isna(), col.shift(1), col))
        return same_as_upper(col)
    else:
        return col

Applying the function using df['A']=same_as_upper(df['A']), this simple case with your data would render:

A B C
Text Text nan
30 Text Text
30 Text Text
30 Text Text
31 Text Text
Text Text nan
31 Text Text
31 Text Text
31 Text Text
32 Text Text
Text Text nan
Text Text Text
Text Text Text
Text Text Text
Text Text Text

However, you have the additional constraint of not replicating the previous value if this value is not an integer. In that case, one solution is add a sample value in the appropriate location, use the same function and then replace the sample value for nan:

df.loc[11,'A']=999
df['A']=same_as_upper(df['A'])
df['A']=df['A'].replace(999,np.nan)

Result:

A B C
Text Text nan
30 Text Text
30 Text Text
30 Text Text
31 Text Text
Text Text nan
31 Text Text
31 Text Text
31 Text Text
32 Text Text
Text Text nan
nan Text Text
nan Text Text
nan Text Text
nan Text Text
Lucas
  • 1,166
  • 2
  • 14
  • 34
  • For the small amount of data I'll be working on, this is perfect. And it easily gives the user the option to assign the values to a new column! – tim.rohrer Feb 01 '23 at 13:37