2

For contextualization, in this dataframe each card validation in a bus transit system of a city is a row. In this city, there's a rule: if the next validation have been done within 90 minutes, it'll get 50% fare discount. That's what we call INTEGRACAO = True. However, it's valid just for group of two validations, in the third validation you can't get any discount.

Anyhow, if the fourth validation have also been done within 90 minutes in reference to the third, it'll get 50% fare discount too (INTEGRACAO = True), following this idea in the next validations.

Initial df

       DATAHORA_UTILIZACAO  CARTAO_USUARIO
314388 2020-03-04 08:06               1
322718 2020-03-04 08:12               1
324216 2020-03-04 08:13               1
431669 2020-03-04 10:22               1
457794 2020-03-04 11:03               1
483567 2020-03-04 11:37               1
534746 2020-03-04 12:31               1
586949 2020-03-04 13:31               1
602406 2020-03-04 13:53               1
604226 2020-03-04 13:55               1
691412 2020-03-04 15:50               1
698074 2020-03-04 15:58               1
699002 2020-03-04 15:59               1
771140 2020-03-04 17:02               1
838434 2020-03-04 17:43               1
956096 2020-03-04 18:59               1

Code

df_cartoes = df_cartoes.sort_values(by=['DATAHORA_UTILIZACAO', 'CARTAO_USUARIO'])
df_cartoes = df_cartoes.reset_index(drop=True)
df_cartoes.loc[:,'SEQ_VALIDACAO'] = df_cartoes.groupby('CARTAO_USUARIO').cumcount()
df_cartoes.loc[:,'INTERVALO_INTEGRACAO_SEG'] = df_cartoes.groupby('CARTAO_USUARIO')['DATAHORA_UTILIZACAO'].diff()
df_cartoes.loc[:,'INTERVALO_INTEGRACAO_SEG'] = df_cartoes['INTERVALO_INTEGRACAO_SEG'].apply(lambda x: x.total_seconds())
df_cartoes.loc[:,'INTERVALO_INTEGRACAO_MIN'] = df_cartoes['INTERVALO_INTEGRACAO_SEG'].div(60)
df_cartoes.loc[((df_cartoes['INTERVALO_INTEGRACAO_SEG'] > 300) & (df_cartoes['INTERVALO_INTEGRACAO_SEG'] <= 5400)), 'INTEGRACAO'] = True
df_cartoes.loc[:,'INTEGRACAO'] = df_cartoes['INTEGRACAO'].fillna(False)
for i in range(0,5):
    df_cartoes.loc[((df_cartoes['INTEGRACAO'] == True) & ((df_cartoes.shift(1)['INTEGRACAO'] == True) & (df_cartoes.shift(1)['CARTAO_USUARIO'] == df_cartoes['CARTAO_USUARIO'])) & ((df_cartoes.shift(2)['INTEGRACAO'] == False) & (df_cartoes.shift(2)['CARTAO_USUARIO'] == df_cartoes['CARTAO_USUARIO']))), 'INTEGRACAO'] = False

Output df

       DATAHORA_UTILIZACAO  CARTAO_USUARIO  SEQ_VALIDACAO  INTERVALO_INTEGRACAO_SEG  INTERVALO_INTEGRACAO_MIN  INTEGRACAO
314388 2020-03-04 08:06               1              0                       NaN                       NaN       False
322718 2020-03-04 08:12               1              1                     348.0                  5.800000        True
324216 2020-03-04 08:13               1              2                      63.0                  1.050000       False
431669 2020-03-04 10:22               1              3                    7784.0                129.733333       False
457794 2020-03-04 11:03               1              4                    2442.0                 40.700000        True
483567 2020-03-04 11:37               1              5                    2048.0                 34.133333        True
534746 2020-03-04 12:31               1              6                    3240.0                 54.000000        True
586949 2020-03-04 13:31               1              7                    3591.0                 59.850000        True
602406 2020-03-04 13:53               1              8                    1297.0                 21.616667        True
604226 2020-03-04 13:55               1              9                     163.0                  2.716667       False
691412 2020-03-04 15:50               1             10                    6863.0                114.383333       False
698074 2020-03-04 15:58               1             11                     468.0                  7.800000        True
699002 2020-03-04 15:59               1             12                      62.0                  1.033333       False
771140 2020-03-04 17:02               1             13                    3821.0                 63.683333        True
838434 2020-03-04 17:43               1             14                    2464.0                 41.066667        True
956096 2020-03-04 18:59               1             15                    4529.0                 75.483333        True

Expected output df

       DATAHORA_UTILIZACAO  CARTAO_USUARIO  SEQ_VALIDACAO  INTERVALO_INTEGRACAO_SEG  INTERVALO_INTEGRACAO_MIN  INTEGRACAO
314388 2020-03-04 08:06               1              0                       NaN                       NaN       False
322718 2020-03-04 08:12               1              1                     348.0                  5.800000        True
324216 2020-03-04 08:13               1              2                      63.0                  1.050000       False
431669 2020-03-04 10:22               1              3                    7784.0                129.733333       False
457794 2020-03-04 11:03               1              4                    2442.0                 40.700000        True
483567 2020-03-04 11:37               1              5                    2048.0                 34.133333       False
534746 2020-03-04 12:31               1              6                    3240.0                 54.000000        True
586949 2020-03-04 13:31               1              7                    3591.0                 59.850000       False
602406 2020-03-04 13:53               1              8                    1297.0                 21.616667        True
604226 2020-03-04 13:55               1              9                     163.0                  2.716667       False
691412 2020-03-04 15:50               1             10                    6863.0                114.383333       False
698074 2020-03-04 15:58               1             11                     468.0                  7.800000        True
699002 2020-03-04 15:59               1             12                      62.0                  1.033333       False
771140 2020-03-04 17:02               1             13                    3821.0                 63.683333        True
838434 2020-03-04 17:43               1             14                    2464.0                 41.066667       False
956096 2020-03-04 18:59               1             15                    4529.0                 75.483333        True

Using a unique CARTAO_USUARIO in dataframe with this part of code I could get the expected output. However, with a lot of CARTAO_USUARIO I can't get my expected output.

for i in range(0,5):
    df_cartoes.loc[((df_cartoes['INTEGRACAO'] == True) & ((df_cartoes.shift(1)['INTEGRACAO'] == True)) & ((df_cartoes.shift(2)['INTEGRACAO'] == False))), 'INTEGRACAO'] = False

Basically, the rule is that I can't get two True in sequence in the same CARTAO_USUARIO. I tried all the possibilities here in stackoverflow, but I couldn't find a solution.

  • I've found a solution based on [this question](https://stackoverflow.com/questions/23330654/update-a-dataframe-in-pandas-while-iterating-row-by-row). Unfortunately, stackoverflow doesn't let me answer my own question with my code by now. – Rodrigo Cunha Jul 05 '22 at 17:20

0 Answers0