0

I am attempting to replace all values in the row that have length > 0 with the first nonzero value. If the row has length 0, replace it with float 0.0

This is the expected input:

    VOL1    VOL2    D
    0       1       3
    21      21      
    19      0       0
    18      0       

This is the expected output:

    VOL1    VOL2    D
    1       1       1
    21      21      0.0
    19      19      19  
    18      18      0.0

Thus far, this is what I have attempted:

import pandas as pd
import numpy as np

data = {
        'VOL1':[0, 21, 19, 18],
        'VOL2':[1, 21, 0, 0],
       }
 
# Create DataFrame
df = pd.DataFrame(data)
df['D'] = [3,"",0,""]

#get first nonzero
first_nonzero_df = df[df!=0].cumsum(axis=1).min(axis=1)
if df.isnull().any(axis=1):
  df.any(axis=1).replace(df, first_nonzero_df)

It's unclear to me what I'm doing wrong here, any help is appreciated. Thanks!

silvercoder
  • 139
  • 1
  • 8
  • What is column D? – not_speshal Sep 28 '21 at 20:22
  • Column D contains cells that are supposed to get replaced with 0.0 – silvercoder Sep 28 '21 at 20:24
  • 1
    So it's always going to be 0 values? Is it needed? – not_speshal Sep 28 '21 at 20:27
  • I suppose I could've setup a better example. There are other columns that have numbers as well as blanks – silvercoder Sep 28 '21 at 20:39
  • blanks and ``None`` are different. I guess you were trying to have ``None``, right? – Karina Sep 28 '21 at 20:42
  • the data i'm sourcing actually has blanks. which is why my initial thought process was to do a replace if there's a length > 0. I've updated column D to represent what that column should look like. Apologies for the lack of clarity – silvercoder Sep 28 '21 at 20:53
  • How did the 3 in row 1 column D change to 1 in your output? – not_speshal Sep 28 '21 at 20:57
  • because the first non-zero value discovered was 1. The same way 19 gets updated to both the second and third columns in row 3 – silvercoder Sep 28 '21 at 20:58
  • But 3 is a non-zero value. Why should it be updated? And if it *is* updated, shouldn't column D in row 2 also be updated to 21? – not_speshal Sep 28 '21 at 21:00
  • everything in a row gets updated with the first non-zero value unless its a blank in which case it gets updated to 0. Column D row 2 is blank, that's why it gets updated to 0 not 21. Sorry if my setup for this wasn't clear – silvercoder Sep 28 '21 at 22:11
  • @silver - So you have the same value in every column in every row except for blanks? What about a row that in `[1, 0, 2, ""]`? – not_speshal Sep 28 '21 at 23:30

2 Answers2

1

IIUC, try:

>>> df.where(df!=0, df[df!=0].ffill(axis=1).bfill(axis=1)).replace("",0)
   VOL1  VOL2     D
0     1     1   3.0
1    21    21   0.0
2    19    19  19.0
3    18    18   0.0
not_speshal
  • 22,093
  • 2
  • 15
  • 30
0
import pandas as pd
data = {
        'VOL1':[0, 21, 19, 18],
        'VOL2':[1, 21, 0, 0],
       }
 
# Create DataFrame
df = pd.DataFrame(data)
df['D'] = [None] * len(df)

first_nonzero_df = df[df!=0].cumsum(axis=1).min(axis=1)

keys = df.keys()
for i in range(len(df)):
    for j in range(len(keys)):
        if df[f'{keys[j]}'][i] == 0:
            df[f'{keys[j]}'][i] = first_nonzero_df[i]
df = df.fillna(0)
df

Output:

enter image description here

Karina
  • 1,252
  • 2
  • 5
  • 16
  • 1
    It is generally not a good idea to iterate over DataFrames, especially when there are vectorized solutions available. See [here](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas/55557758#55557758). – not_speshal Sep 28 '21 at 20:30
  • I didn't know that. Thanks! and your one liner seems very concise and elegant! – Karina Sep 28 '21 at 20:33
  • 1
    I always a learn a lot from comments on my code too! Thank you for taking it well - I definitely wasn't trying to criticize. – not_speshal Sep 28 '21 at 20:34