0

Here is the sample dataFrame:

data = np.matrix([[4,3,6,4,1,7,5,5], [1,2,3,6,4,2,4,9], ['a',np.nan, np.nan, 'b', np.nan, 'c', np.nan, 'd'],[1,np.nan, np.nan, 2, np.nan, 2, np.nan, 2]]).T
data = pd.DataFrame(data)

>>> data

   0  1    2    3
0  4  1    a    1
1  3  2  nan  nan
2  6  3  nan  nan
3  4  6    b    2
4  1  4  nan  nan
5  7  2    c    2
6  5  4  nan  nan
7  5  9    d    2

>>> data.dtypes

0    object
1    object
2    object
3    object
dtype: object

As you can see, the dtypes for some of the columns are object. They are not float, or int.

If I type in, data.ffill(), to the Console, it doesn't do anything. But, If I try data[3] = data[3].astype(float).ffill()

it changes data to:

   0  1    2    3
0  4  1    a  1.0
1  3  2  nan  1.0
2  6  3  nan  1.0
3  4  6    b  2.0
4  1  4  nan  2.0
5  7  2    c  2.0
6  5  4  nan  2.0
7  5  9    d  2.0

Apparently pd.ffill() works only on numeric columns, but not on string columns. data[2] = data[2].astype(str).ffill() didn't change anything. How can I forward fill on rows with dtype=object?

Here is the output I want:

   0  1    2    3
0  4  1    a  1.0
1  3  2    a  1.0
2  6  3    a  1.0
3  4  6    b  2.0
4  1  4    b  2.0
5  7  2    c  2.0
  • I extracted data from csv using pd.read_csv(). In the original csv file, some columns of numeric values and some have string columns
Eric Kim
  • 2,493
  • 6
  • 33
  • 69
  • The object `nan` is literally the string `'nan'` which will not be recognized as a null value by pandas. When you first do `astype(float)` all of the values become true `np.NaN` null values and so `ffill` recognizes them appropriately, since the string `'nan'` has no numeric equivalent while the string `'1'` can unambiguously be cast into a number. – ALollz Apr 07 '18 at 23:43
  • @ALollz Any neat solution to replace string 'nan' with previous values, other than extracting with non-str dtype in the firstplace? – Eric Kim Apr 07 '18 at 23:54

1 Answers1

2

If all of the strings are just being stored as 'nan' then you can fill the entire DataFrame in one line. None is a recognized null value that works for the object type.

data.mask(data=='nan', None).ffill()

#0    4    1    a    1
#1    3    2    a    1
#2    6    3    a    1
#3    4    6    b    2
#4    1    4    b    2
#5    7    2    c    2
#6    5    4    c    2
#7    5    9    d    2
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Didn't work for my case but this one did: https://stackoverflow.com/a/74292112/10789707 – Lod Aug 14 '23 at 13:23