0

I am trying to read the information in a CSV file using pandas, the CSV has two columns, a timestamp (example: 31/12/2022 21:30) and a float value. This file is generate by an intrument that measures energy compsuntion per quater of an hour. This device sometimes can malfuntion and it would not register some quarters, and i will like to fill that gaps with the time and a 0 value (or the mean of the next value by all the skip quarter plus 1). For that I try to used the solution given here:

https://stackoverflow.com/questions/69205577/fill-gaps-in-time-series-pandas-dataframe

But I could not convert to an index because of duplicated values the day 30-10-2022 due to daylight savings. enter image description here

I tried to convert to the timezone, but it doesn´t seems to work:

telemedida_df=pd.read_csv("filename.csv",delimiter=";",decimal=",")
telemedida_df["Fecha/hora"]=pd.to_datetime(telemedida_df["Fecha/hora"],format='%d/%m/%Y %H:%M',utc=True)
telemedida_df["Fecha/hora"] = telemedida_df["Fecha/hora"].dt.tz_convert('Europe/Madrid')
df_new = (telemedida_df.assign(date=telemedida_df["Fecha/hora"].dt.date)   #create new col 'date' from the timestamp
            .set_index("Fecha/hora")   #Error here          
            .groupby('date')                    
            .apply(lambda x: x.resample('15Min')  
                   .ffill())                    
            .reset_index('date', drop=True)      
            .drop('date',1)                     
            .reset_index()                       
         ) 

The following error shows "cannot reindex a non-unique index with a method or limit". Before the error the Dataframe looks like: Dataframe before error

As you can see I am not getting +01 and +02 in the "Fecha/hora" column.

How could I take transfort datetime to index here taking into account the daylight savings?

Example of CSV DATA (It uses Spanish comma notation insted of US dot for decimal) :

Fecha/hora;Consumo
30/10/2022 0:15;6,9
30/10/2022 0:30;6,6
30/10/2022 0:45;6,7
30/10/2022 1:00;6,7
30/10/2022 1:15;6,6
30/10/2022 1:30;6,9
30/10/2022 1:45;6,8
30/10/2022 2:00;6,7
30/10/2022 2:15;6,6
30/10/2022 2:30;6,7
30/10/2022 2:45;6,7
30/10/2022 2:00;6,7
30/10/2022 2:15;6,7
30/10/2022 2:30;6,8
30/10/2022 2:45;6,7
30/10/2022 3:00;6,8
30/10/2022 3:15;6,8
30/10/2022 3:30;6,6
30/10/2022 3:45;7
30/10/2022 4:00;6,6
30/10/2022 4:15;6,8
30/10/2022 4:30;6,6
30/10/2022 4:45;6,7
30/10/2022 5:00;6,6

I tried different ways of converting to the timezone without any result

Estraviz
  • 28
  • 5
  • well, an index cannot have a duplicate. So you would have to manufacture a different label system to make the index unique... – D.L Aug 22 '23 at 10:55

1 Answers1

1

The code you provided will execute the timezone conversion as programmed. However, you first set up the WHOLE Series as utc and then convert it into tz 'Europe/Madrid'.

This will not help you to remove any duplicates in the original data set, as you just treated the whole data set as utc to then convert it.

Duplicates before the conversion will stay duplicates after the conversion.

  1. If the measured data would really be in utc, you would not have duplicates due to daylight savings in it
  2. If the measured data actually is in Europe/Madrid timezone (what I assume?), you will need to to handle it accordingly. For this, use pd.Series.dt.tz_localize with the ambiguous argument.
dates = pd.Series(['2022-10-29 22:00:00', '2022-10-29 23:00:00',
                   '2022-10-30 00:00:00', '2022-10-30 01:00:00',
                   '2022-10-30 02:00:00',
                   '2022-10-30 02:00:00',
                   '2022-10-30 03:00:00', '2022-10-30 04:00:00',
                   '2022-10-30 05:00:00', '2022-10-30 06:00:00'])

converted = pd.to_datetime(dates).dt.tz_localize('Europe/Madrid',
                                                 ambiguous='infer')

This will result in

0   2022-10-29 22:00:00+02:00
1   2022-10-29 23:00:00+02:00
2   2022-10-30 00:00:00+02:00
3   2022-10-30 01:00:00+02:00
4   2022-10-30 02:00:00+02:00
5   2022-10-30 02:00:00+01:00
6   2022-10-30 03:00:00+01:00
7   2022-10-30 04:00:00+01:00
8   2022-10-30 05:00:00+01:00
9   2022-10-30 06:00:00+01:00
dtype: datetime64[ns, Europe/Madrid]
xArbisRox
  • 91
  • 5