Keep first element for column in a DataFrame

Question

I have a DataFrame like this.

>>> df = pd.DataFrame([[3., 0, 0], [0, 3., 0], [0, 0, 0], [0, 6., 6.], [1., 0, 0], [2., 5., 0]]).T
>>> df
     0    1    2    3    4    5
0  3.0  0.0  0.0  0.0  1.0  2.0
1  0.0  3.0  0.0  6.0  0.0  5.0
2  0.0  0.0  0.0  6.0  0.0  0.0

What I want to do is to keep the first element, column by column, replacing other non-zero values with a zero.

>>> expected
     0    1    2    3    4    5
0  3.0  0.0  0.0  0.0  1.0  2.0
1  0.0  3.0  0.0  6.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0  0.0

My goal is to get a Series of the first elements, and I thought doing this via sum(), so I need zero values for other elements in column.

>>> expected.sum()
0    3.0
1    3.0
2    0.0
3    6.0
4    1.0
5    2.0
dtype: float64

Thank you very much in advance.

score 4 · Accepted Answer · answered Jul 05 '21 at 09:11

4

Mask the zero's then bfill and select the the first row using iloc

df[df != 0].bfill().iloc[0].fillna(0)

0    3.0
1    3.0
2    0.0
3    6.0
4    1.0
5    2.0
Name: 0, dtype: float64

answered Jul 05 '21 at 09:11

Shubham Sharma

68,127
6
24
53

Umar.H · Answer 2 · 2021-07-05T09:27:50.373

2

Another way to first create your target dataframe using a boolean with mask, then sum and specify your axis.

df_new = df.mask(~df.ne(0).cumsum(0).cumsum(0).eq(1)).fillna(0)

     0    1    2    3    4    5
0  3.0  0.0  0.0  0.0  0.0  2.0
1  0.0  3.0  0.0  6.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0  0.0

then

df_new.sum(0)

0    3.0
1    3.0
2    0.0
3    6.0
4    0.0
5    2.0
dtype: float64

edited Jul 05 '21 at 09:27

answered Jul 05 '21 at 09:18

Umar.H

22,559
7
39
74

1

It might be more inclusive to replace `gt(1)` by `ne(0)`. Otherwise, this is a much cleaner answer than mine. – SpaceBurger Jul 05 '21 at 09:26
@Umar.H Seems interesting logic! – Shubham Sharma Jul 05 '21 at 09:34
@ShubhamSharma i was interested in creating a new df with just the first non 0 element, couldn't think of a new way without some reshaping. might be easier to turn into a numpy array. – Umar.H Jul 05 '21 at 09:41
1

@Umar.H here is one idea using reshaping operation with `unstack` `df[df != 0].unstack().groupby(level=0).first()` – Shubham Sharma Jul 05 '21 at 09:47

SpaceBurger · Answer 3 · 2021-07-05T09:16:39.373

0

You could do something like:

import pandas as pd

# initialize table
df = pd.DataFrame([[3., 0, 0], [0, 3., 0], [0, 0, 0], [0, 6., 6.], [1., 0, 0], [2., 5., 0]]).T

# detect first non-zero value
# see https://stackoverflow.com/questions/50586146/find-first-non-zero-value-in-each-column-of-pandas-dataframe for details
non_zero_indexes = list(df.ne(0).idxmax()) # [0, 1, 0, 1, 0, 0]

for col_id in df.columns:
  if non_zero_indexes[col_id] != 0 and len(df) > 1:
    col_start = list(df[col_id][:non_zero_indexes[col_id]+1]) # e.g. [0.0, 6.0]
    col_end   = [0.0] * (len(df) - len(col_start)) # [0.0], i.e. fill with zeros
    df[col_id] = col_start + col_end # merge and get [0.0, 6.0, 0.0]

That way, you get the following output:

>>> df
     0    1    2    3    4    5
0  3.0  0.0  0.0  0.0  1.0  2.0
1  0.0  3.0  0.0  6.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0  0.0

edited Jul 05 '21 at 09:16

answered Jul 05 '21 at 08:55

SpaceBurger

537
2
12

1

For a single value access [at instead of *loc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.at.html) is preferred, and more fast – Glauco Jul 05 '21 at 08:57
This output however is different from my expected. My first non-zero value can be anywhere from first to last row - or be missing at all. – crissal Jul 05 '21 at 09:00
Also, I just saw my answer is incorrect. For column 3 for example, I need `(0.0, 6.0, 0.0)` instead of `(0.0, 6.0, 6.0)`. – SpaceBurger Jul 05 '21 at 09:00
There is a partial answer to the question [here](https://stackoverflow.com/questions/50586146/find-first-non-zero-value-in-each-column-of-pandas-dataframe), I'll try to use that in my answer – SpaceBurger Jul 05 '21 at 09:02

Keep first element for column in a DataFrame

3 Answers3