How to remove missing data and 0s whilst keeping the dataframe the same shape using Pandas?

Question

I have a dataframe and I want to reformat it in order it remove the instances of whether a missing value or a zero occurs before the first non-zero value appears across a row. However I do not want to delete any rows or columns and I do not want to remove any 0s or missing values which appear after the non-zeroes.

Below is the dataframe I am working with:

> data =[['Adam',2.55,4.53,3.45,2.12,3.14],['Bill',np.NaN,2.14,3.65,4.12],['Chris',np.NaN,0,2.82,0,6.04],['David',np.NaN,0,7.42,3.52]]

> df = pd.DataFrame(data, columns = ['Name', 'A','B','C','D','E'])

Moreover, here is the expected outcome:

> data1 =[['Adam',2.55,4.53,3.45,2.12,3.14],['Bill',2.14,3.65,4.12],['Chris',2.82,0,6.04],['David',7.42,3.52]]

> df1 = pd.DataFrame(data1, columns = ['Name', 'A','B','C','D','E'])

That's a strange task you've got here. So you don't care that the column's values will shift? — Akaisteph7, Jul 06 '19 at 08:55
What do you mean by "I dont want to remove any rows or columns". Can you make the question more clear? — Victor Ruiz, Jul 06 '19 at 08:57
I want the values to shift so that every person starts off at the same time with a non-zero value. And in regards to "I dont want to remove any rows or columns" I want to keep the overall table the same shape. Sorry if I did not make it clear. — Quincy, Jul 06 '19 at 10:14

anky · Accepted Answer · 2019-07-06T09:05:36.413

This is not a trivial problem. Here is the solution:

m=df.set_index('Name')
m=m[m.isin(m.mask(m.le(0)).bfill(axis=1).iloc[:,0]).cumsum(axis=1).astype(bool)]
print(m)

         A     B     C     D     E
Name                               
Adam   2.55  4.53  3.45  2.12  3.14
Bill    NaN  2.14  3.65  4.12   NaN
Chris   NaN   NaN  2.82  0.00  6.04
David   NaN   NaN  7.42  3.52   NaN

Then using justify:

pd.DataFrame(justify(m.values,np.nan),columns=m.columns,index=m.index).reset_index()

    Name     A     B     C     D     E
0   Adam  2.55  4.53  3.45  2.12  3.14
1   Bill  2.14  3.65  4.12   NaN   NaN
2  Chris  2.82  0.00  6.04   NaN   NaN
3  David  7.42  3.52   NaN   NaN   NaN

Explanation:

Step1: Set the Name column as index so we can deal with numeric values only. Step2: m.mask(m.le(0)).bfill(axis=1).iloc[:,0] gives the first value which is greater than 0. Step3: Then using isin() to return True wherever the value appears in each row. Step4: cumsum(axis=1).astype(bool) makes all the remaining elements as True so we can filter only those values, other values becomes NaN. Then use justify function from the linked post.

How to remove missing data and 0s whilst keeping the dataframe the same shape using Pandas?

1 Answers1