0

I have a set of data, where I predict the amount of fuel I need around 10 weeks ahead. I have it all set up in a single dataframe presented as staircase date. This means, the closer I come to the last entry for a week the more accurate the values get. I want to cut all missing values and ignore the exact date so I can just look at my predictions in relation to the distance of the predicted week.

Input dataframe:

Index 2020-01   2020-02  2020-03  2020-04  2020-05  2020-06
1.         10        10        5        0        0        0
2.          0         5        5       10        0        0
3.          0         0       10        4        3        0
4.          0         0        0        1        7        6

Outcome should be:

Index      W1        W2      W3
1.         10        10       5     
2.          5        5       10   
3.         10        4        3        
4.          1        7        6

Many Thanks in advance

mozway
  • 194,879
  • 13
  • 39
  • 75

4 Answers4

0

Use justify function for remove shift non 0 values, last remove columns filled only 0 values:

c = [f'W{x + 1}' for x, _ in enumerate(df.columns)]
df = pd.DataFrame(justify(df.to_numpy()), index=df.index, columns=c)
df = df.loc[:, df.ne(0).any()]
print (df)
       W1  W2  W3
Index            
1.0    10  10   5
2.0     5   5  10
3.0    10   4   3
4.0     1   7   6

##https://stackoverflow.com/a/44559180/2901002
def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = ~np.isnan(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
-1

You could replace the zeros with NaNs and reset the Series per row:

df2 = (
 df.replace(0,float('nan'))
   .apply(lambda s: s.dropna().reset_index(drop=True), axis=1)
   .astype(int) 
)

df2.columns = df2.columns.map(lambda x: f'W{x+1}')

output:

     W1  W2  W3
1.0  10  10   5
2.0   5   5  10
3.0  10   4   3
4.0   1   7   6

ensuring that only the outer 0s are removed

I set 2./2020-03 to 0 for this example

You can use ffill+bfill to compute a mask:

m = df.ne(0)
m1 = m.where(m).bfill(axis=1)
m2 = m.where(m).ffill(axis=1)

df2 = (
 df.where(m1&m2)  # internal 0s are those True for both ffill/bfill
   .apply(lambda s: s.dropna().reset_index(drop=True), axis=1)
   .astype(int) 
)

df2.columns = df2.columns.map(lambda x: f'W{x+1}')

output:

     W1  W2  W3
1.0  10  10   5
2.0   5   0  10
3.0  10   4   3
4.0   1   7   6
mozway
  • 194,879
  • 13
  • 39
  • 75
  • hey thanks for your answer, this looks promising. But I have a few cases, where the set looks like this: 4 0 4 and I would miss the middle value. Do you have any idea how to handle those cases? – Moritz Backhaus Jan 27 '22 at 09:55
  • @MoritzBackhaus yes you need to compute a mask, I updated the answer – mozway Jan 27 '22 at 09:59
  • Many Thanks for your help so far, something did happen to my data indeed, sadly it is not working perfect. I can see, that your solution will do the intended thing, if my input is correct. I am reading the df as a string from excel and I think not all zeros get identified as zeros, but as a value. I will try to fix that and comeback later. – Moritz Backhaus Jan 27 '22 at 10:28
  • @MoritzBackhaus maybe start a new question to solve the "imperfect data" issue, then this one should work as expected ;) – mozway Jan 27 '22 at 10:29
  • Hey I was able to use your solution and it is working perfectly. Many many thanks. I have just one thing to add. I had to delete the `.astype(int) ` because it caused an error message. But after removing I was ready to go. Thanks – Moritz Backhaus Jan 27 '22 at 10:50
-1

Using a custom function and apply would be the most straightforward and easily understood way:

def merge_row(row):
    vals = [v for v in row.values if v != 0]
    return pd.Series({f'W{i}': v for i, v in enumerate(vals)})

df.apply(merge_row, axis=1)

Result:

     W0  W1  W2
Index           
1.0  10  10   5
2.0   5   5  10
3.0  10   4   3
4.0   1   7   6
Shaido
  • 27,497
  • 23
  • 70
  • 73
-1

You can use numpy to sort by the 0/non-0 state and rebuild a DataFrame:

import numpy as np

a = df.to_numpy()
b = a==0
idx = np.argsort(b, axis=1)
n_cols = (~b).sum(1).max()
pd.DataFrame(np.take_along_axis(a, idx, axis=1)[:, :n_cols],
             columns=[f'W{i}' for i in np.arange(n_cols)+1],
             index=df.index
            )

output:

     W1  W2  W3
1.0  10  10   5
2.0   5   5  10
3.0  10   4   3
4.0   1   7   6

handling internal zeros:

import numpy as np

a = df.to_numpy()

m1 = a.cumsum(1) != 0
m2 = np.fliplr(np.fliplr(a).cumsum(1)) != 0
m = (m1&m2)

idx = np.argsort(~m, axis=1)
n_cols = m.sum(1).max()
pd.DataFrame(np.take_along_axis(a, idx, axis=1)[:, :n_cols],
             columns=[f'W{i}' for i in np.arange(n_cols)+1],
             index=df.index
            )
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Hello again, thanks for your help, but this solution is not working, because I have three leading columns with some data like the week the prediction is from. What parameter do I have to change? – Moritz Backhaus Jan 27 '22 at 10:06
  • 1
    @MoritzBackhaus set those columns as index temporarily, or split the needed columns, perform the numpy thingy, and join back after processing. Let me know if you need an example – mozway Jan 27 '22 at 10:08