Python- Create a counter column based on changes of another column and start the counter again

Question

I have a dataset with the Moon Phases. I want to create a new column that counts the days from Fase One to Fase Four. Starting again from 1 once a Fase One starts. So at the end, my counter would have from 1 to 27 or 28 days.

I checked this link but I haven't manages to start the counting from 1 again Counter Column in Pandas DataFrame where it changes based on other column. I tried with a for but is not giving me the result I expected

I tried with a for and with out the for

for i in Moon.phaseIdM:
    Moon['phaseMDay'] = (Moon.phaseIdM.eq(1) 
        & Moon.phaseIdM.shift().eq(4)).cumsum() + 1

I expect:

phaseM | phaseMday

  1    |    1

  1    |    2

  2    |    3

  2    |    4

  3    |    5

  4    |    6

  4    |    7

  1    |    1  

  2    |    2 ...

What I get:

phaseM | phaseMday

  1    |    1

  1    |    1

  2    |    1

  2    |    1

  3    |    1

  4    |    1

  4    |    1

  1    |    2  

  2    |    2 ...

Thanks in advance for your help

Sounds a bit like [this](https://stackoverflow.com/questions/29353096/add-a-sequence-number-to-each-element-in-a-group-using-python) — josemz, Mar 28 '19 at 15:48

score 3 · Answer 1 · answered Mar 28 '19 at 16:05

3

You can create a Phase 1-Phase 4 grouper as follows:

df['phase_group'] = ((df['phaseM']==1) & (df['phaseM'].shift()==4)).cumsum()

Then group and do do cumulative count:

df['phaseMday'] = df.groupby('phase_group').cumcount()+1

answered Mar 28 '19 at 16:05

ags29

2,621
1
8
14

Ben.T · Accepted Answer · 2019-03-28T16:24:03.350

1

You can do it by first assigning incremental values to the all column, and then remove the value of the last starting row Face One to all the rows under by checking where the condition of a new start is and using cummax such as:

Moon['phaseMDay'] = np.arange(len(Moon))
Moon['phaseMDay'] -= (Moon['phaseMDay']*(Moon.phaseIdM.eq(1) 
                                         & Moon.phaseIdM.shift().eq(4)).values).cummax()-1

And you get:

print (Moon)
   phaseIdM  phaseMDay
0         1          1
1         1          2
2         2          3
3         2          4
4         3          5
5         4          6
6         4          7
7         1          1
8         2          2

edited Mar 28 '19 at 16:24

answered Mar 28 '19 at 16:00

Ben.T

29,160
6
32
54

1

Thank you @Ben. T. At one point this code is working but then it does start counting negative and big numbers. (I have 2559 rows) and in the middle it starts down counting phaseMDay -107284, 107285... – John Perez Mar 28 '19 at 16:20
1

@Monitotier yes sorry, replace `cumsum` by `cummax` it should be better – Ben.T Mar 28 '19 at 16:23
1

yeah, that changed it everything ;) Thanks, it worked perfectly – John Perez Mar 28 '19 at 16:27

CAPSLOCK · Answer 3 · 2019-03-28T15:55:44.720

0

This should do but I am pretty sure that there is a vectorized way to do the same.

df=pd.DataFrame([1,1,2,2,3,3,4,4,1,2],columns=['phaseM'])
df['phaseMday']=np.linspace(1,len(df),len(df))
for i in range(1,len(df1)):
    if df['phaseM'].iloc[i]<df['phaseM'].iloc[i-1]:
        df['phaseMday'].iloc[i]=1
    else:
        df['phaseMday'].iloc[i]=df1['phaseMday'].iloc[i-1]+1

edited Mar 28 '19 at 15:55

answered Mar 28 '19 at 15:50

CAPSLOCK

6,243
3
33
56

Python- Create a counter column based on changes of another column and start the counter again

3 Answers3