1

I have a dataset with the Moon Phases. I want to create a new column that counts the days from Fase One to Fase Four. Starting again from 1 once a Fase One starts. So at the end, my counter would have from 1 to 27 or 28 days.

I checked this link but I haven't manages to start the counting from 1 again Counter Column in Pandas DataFrame where it changes based on other column. I tried with a for but is not giving me the result I expected

I tried with a for and with out the for

for i in Moon.phaseIdM:
    Moon['phaseMDay'] = (Moon.phaseIdM.eq(1) 
        & Moon.phaseIdM.shift().eq(4)).cumsum() + 1

I expect:

phaseM | phaseMday

  1    |    1

  1    |    2

  2    |    3

  2    |    4

  3    |    5

  4    |    6

  4    |    7

  1    |    1  

  2    |    2 ...

What I get:

phaseM | phaseMday

  1    |    1

  1    |    1

  2    |    1

  2    |    1

  3    |    1

  4    |    1

  4    |    1

  1    |    2  

  2    |    2 ...

Thanks in advance for your help

DYZ
  • 55,249
  • 10
  • 64
  • 93
John Perez
  • 117
  • 10

3 Answers3

3

You can create a Phase 1-Phase 4 grouper as follows:

df['phase_group'] = ((df['phaseM']==1) & (df['phaseM'].shift()==4)).cumsum()

Then group and do do cumulative count:

df['phaseMday'] = df.groupby('phase_group').cumcount()+1
ags29
  • 2,621
  • 1
  • 8
  • 14
1

You can do it by first assigning incremental values to the all column, and then remove the value of the last starting row Face One to all the rows under by checking where the condition of a new start is and using cummax such as:

Moon['phaseMDay'] = np.arange(len(Moon))
Moon['phaseMDay'] -= (Moon['phaseMDay']*(Moon.phaseIdM.eq(1) 
                                         & Moon.phaseIdM.shift().eq(4)).values).cummax()-1 

And you get:

print (Moon)
   phaseIdM  phaseMDay
0         1          1
1         1          2
2         2          3
3         2          4
4         3          5
5         4          6
6         4          7
7         1          1
8         2          2
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • 1
    Thank you @Ben. T. At one point this code is working but then it does start counting negative and big numbers. (I have 2559 rows) and in the middle it starts down counting phaseMDay -107284, 107285... – John Perez Mar 28 '19 at 16:20
  • 1
    @Monitotier yes sorry, replace `cumsum` by `cummax` it should be better – Ben.T Mar 28 '19 at 16:23
  • 1
    yeah, that changed it everything ;) Thanks, it worked perfectly – John Perez Mar 28 '19 at 16:27
0

This should do but I am pretty sure that there is a vectorized way to do the same.

df=pd.DataFrame([1,1,2,2,3,3,4,4,1,2],columns=['phaseM'])
df['phaseMday']=np.linspace(1,len(df),len(df))
for i in range(1,len(df1)):
    if df['phaseM'].iloc[i]<df['phaseM'].iloc[i-1]:
        df['phaseMday'].iloc[i]=1
    else:
        df['phaseMday'].iloc[i]=df1['phaseMday'].iloc[i-1]+1
CAPSLOCK
  • 6,243
  • 3
  • 33
  • 56