I'm trying to import some timeseries data and convert it into UTC so I can merge it with another dataset. This data seems to have 24 hour data and doesn't have DST adjustments. This post gives a similar answer, but they simply drop the line. I need to shift it so I can merge it with my other data.
When I run my code:
df = pd.read_csv('http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2010/data/hourly/{}/{}_{}_solar.csv'.format(723898,723898,1998), usecols=["YYYY-MM-DD", "HH:MM (LST)","Meas Glo (Wh/m^2)","Meas Dir (Wh/m^2)","Meas Dif (Wh/m^2)"])
def clean_time(obj):
hour = int(obj[0:-3])
hour = str(hour - 1)
if len(str(hour)) == 2:
return hour+":00"
else:
return "0" + hour + ":00"
df['HH:MM (LST)'] = df['HH:MM (LST)'].apply(clean_time)
df['DateTime'] = df['YYYY-MM-DD'] + " " + df['HH:MM (LST)']
df = df.set_index(pd.DatetimeIndex(df['DateTime']))
df.drop(["YYYY-MM-DD", "HH:MM (LST)",'DateTime'],axis=1,inplace=True)
df.index = df.index.tz_localize('US/Pacific', ambiguous='infer')
I get:
pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 1998-10-25 01:00:00 as there are no repeated times
If I leave ambiguous='raise' (the default), it gives me:
pytz.exceptions.NonExistentTimeError: 1998-04-05 02:00:00
So I'm stuck on either the start, or end of daylight savings time.
There's quite a few of these datasets (multiple sites over multiple years) I need to merge, so I'd prefer not to hand code specific hours to shift, but I'm still a novice and can't quite figure out my next steps here.
Appreciate the help!