1

I have a dataframe and I want to reformat it in order it remove the instances of whether a missing value or a zero occurs before the first non-zero value appears across a row. However I do not want to delete any rows or columns and I do not want to remove any 0s or missing values which appear after the non-zeroes.

Below is the dataframe I am working with:

> data =[['Adam',2.55,4.53,3.45,2.12,3.14],['Bill',np.NaN,2.14,3.65,4.12],['Chris',np.NaN,0,2.82,0,6.04],['David',np.NaN,0,7.42,3.52]]

> df = pd.DataFrame(data, columns = ['Name', 'A','B','C','D','E'])

Moreover, here is the expected outcome:

> data1 =[['Adam',2.55,4.53,3.45,2.12,3.14],['Bill',2.14,3.65,4.12],['Chris',2.82,0,6.04],['David',7.42,3.52]]

> df1 = pd.DataFrame(data1, columns = ['Name', 'A','B','C','D','E']) 
Quincy
  • 143
  • 6
  • That's a strange task you've got here. So you don't care that the column's values will shift? – Akaisteph7 Jul 06 '19 at 08:55
  • What do you mean by "I dont want to remove any rows or columns". Can you make the question more clear? – Victor Ruiz Jul 06 '19 at 08:57
  • I want the values to shift so that every person starts off at the same time with a non-zero value. And in regards to "I dont want to remove any rows or columns" I want to keep the overall table the same shape. Sorry if I did not make it clear. – Quincy Jul 06 '19 at 10:14

1 Answers1

1

This is not a trivial problem. Here is the solution:

m=df.set_index('Name')
m=m[m.isin(m.mask(m.le(0)).bfill(axis=1).iloc[:,0]).cumsum(axis=1).astype(bool)]
print(m)

         A     B     C     D     E
Name                               
Adam   2.55  4.53  3.45  2.12  3.14
Bill    NaN  2.14  3.65  4.12   NaN
Chris   NaN   NaN  2.82  0.00  6.04
David   NaN   NaN  7.42  3.52   NaN

Then using justify:

pd.DataFrame(justify(m.values,np.nan),columns=m.columns,index=m.index).reset_index()

    Name     A     B     C     D     E
0   Adam  2.55  4.53  3.45  2.12  3.14
1   Bill  2.14  3.65  4.12   NaN   NaN
2  Chris  2.82  0.00  6.04   NaN   NaN
3  David  7.42  3.52   NaN   NaN   NaN

Explanation:

Step1: Set the Name column as index so we can deal with numeric values only. Step2: m.mask(m.le(0)).bfill(axis=1).iloc[:,0] gives the first value which is greater than 0. Step3: Then using isin() to return True wherever the value appears in each row. Step4: cumsum(axis=1).astype(bool) makes all the remaining elements as True so we can filter only those values, other values becomes NaN. Then use justify function from the linked post.

anky
  • 74,114
  • 11
  • 41
  • 70