1

I have a data frame that looks like this with timestamp in UTC seconds

               open     high      low    close    volumeto
time                                                      
1530169200  6112.81  6120.62  6108.65  6111.63  2212255.01
1530170100  6111.63  6119.12  6106.45  6113.59  1572299.36
1530171000  6113.59  6116.44  6104.34  6110.23  2792660.45
1530171900  6110.23  6123.71  6106.49  6123.71  2314140.04
1530172800  6121.33  6133.24  6121.18  6129.52  2037071.96

When I try to write this to csv, this is what I get, I guess pandas is assuming that the supplied time is local time and offsetting it by 5 hours 30 mins but I have supplied UTC time

1530149400,6112.81,6120.62,6108.65,6111.63,2212255.01:
1530150300,6111.63,6119.12,6106.45,6113.59,1572299.36:
1530151200,6113.59,6116.44,6104.34,6110.23,2792660.45:
1530152100,6110.23,6123.71,6106.49,6123.71,2314140.04:
1530153000,6121.33,6133.24,6121.18,6129.52,2037071.96:

My code looks as shown below

csv_string = io.StringIO()
df.to_csv(csv_string, line_terminator=':', header=False, date_format='%s')
print(csv_string.getvalue())

How do I tell Pandas that I have supplied UTC time and do not wish to offset it while converting?

PirateApp
  • 5,433
  • 4
  • 57
  • 90

1 Answers1

2

One way to do this is to first make the time column timezone-aware with tz_localize(). In your case, assuming that your DataFrame is called df:

df.index = df.index.tz_localize(tz='UTC')

Now, the index is timezone aware. However, I am not sure if this is the reason for the time being different.

EDIT If the index already has a tz attached to it, you could change that in much the same as adding a timezone, but now with tz_convert, as your error indicated. Code would become:

df.index = df.index.tz_convert(tz='UTC')

However, this would also modify the time. In order to replace the timezone with timezone UTC you need to do the following:

import pytz
df.index = [t.replace(tzinfo=pytz.utc) for t in df.index]

However, before you do this it might be good to first check what the time zone is and see if this corresponds to the 5:30 hours difference. Furthermore, also realize that using date_format='%s' ignores the timezone information and usually assumes the timezone of the system. See for some more information, the following accepted answer:

Python - Setting a datetime in a specific timezone (without UTC conversions)

BTW if I just copy-paste your DataFrame to my machine and write it to_csv it just works as expected.

westr
  • 569
  • 5
  • 17