0

I have a data about client sales for 12 months.Need to calculate average but with condition - only from the first month when client ordered something (including zero sales after the month client bought something for the first time)

enter image description here

Using .mean function doesnt give a correct result

mozway
  • 194,879
  • 13
  • 39
  • 75
  • Have you given it a try? Also could you add a [minimal example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) **as code** in your question instead of an image? – mozway May 12 '23 at 11:14

1 Answers1

1

Use a mask (where with a cummax to propagate the booleans to the right) to only keep the values after the first non-zero:

df.where(df.ne(0).cummax(axis=1)).mean(axis=1)

Example:

df = pd.DataFrame([[0, 1, 2, 3],  # mean = 2
                   [0, 0, 1, 2],  # mean = 1.5
                   [1, 0, 0, 0]]) # mean = 0.25

df.where(df.ne(0).cummax(axis=1)).mean(axis=1)

Output:

0    2.00
1    1.50
2    0.25
dtype: float64
mozway
  • 194,879
  • 13
  • 39
  • 75