0

I'm using pandas.read_rsq_query() method to convert my SQL query shown below to a dataframe.

This is the query:

SELECT begin_ts, process_name, avg(count) FROM ecn_stats_2019_06_21 WHERE process_name LIKE 'matching_%_gw' and name = 'raw_msg_count' and begin_ts = '2019-06-21 12:55:00' GROUP BY begin_ts, process_name ORDER BY begin_ts, process_name;

When I run the SQL query in my terminal I get the output below:

begin_ts        |  process_name  |         avg
------------------------+----------------+----------------------
 2019-06-21 12:55:00-04 | matching_01_gw |  252722.250000000000
 2019-06-21 12:55:00-04 | matching_02_gw |  233463.000000000000
 2019-06-21 12:55:00-04 | matching_03_gw |  287673.666666666667
 2019-06-21 12:55:00-04 | matching_04_gw |  201417.000000000000
 2019-06-21 12:55:00-04 | matching_05_gw |  243640.500000000000
 2019-06-21 12:55:00-04 | matching_06_gw |  235529.333333333333
 2019-06-21 12:55:00-04 | matching_07_gw |  203518.666666666667
 2019-06-21 12:55:00-04 | matching_08_gw |  266112.666666666667
 2019-06-21 12:55:00-04 | matching_09_gw | 1066127.000000000000
 2019-06-21 12:55:00-04 | matching_10_gw |  734972.000000000000
 2019-06-21 12:55:00-04 | matching_11_gw |  237903.000000000000
 2019-06-21 12:55:00-04 | matching_12_gw |  238116.000000000000
(12 rows)

But the resulting dataframe looks like this:

                    begin_ts    process_name           avg
0  2019-06-21 16:55:00+00:00  matching_01_gw  2.527222e+05
1  2019-06-21 16:55:00+00:00  matching_02_gw  2.334630e+05
2  2019-06-21 16:55:00+00:00  matching_03_gw  2.876737e+05
3  2019-06-21 16:55:00+00:00  matching_04_gw  2.014170e+05
4  2019-06-21 16:55:00+00:00  matching_05_gw  2.436405e+05
5  2019-06-21 16:55:00+00:00  matching_06_gw  2.355293e+05
6  2019-06-21 16:55:00+00:00  matching_07_gw  2.035187e+05
7  2019-06-21 16:55:00+00:00  matching_08_gw  2.661127e+05
8  2019-06-21 16:55:00+00:00  matching_09_gw  1.066127e+06
9  2019-06-21 16:55:00+00:00  matching_10_gw  7.349720e+05
10 2019-06-21 16:55:00+00:00  matching_11_gw  2.379030e+05
11 2019-06-21 16:55:00+00:00  matching_12_gw  2.381160e+05

When I print the self.endTime variable from the SQL query I get

 self.endTime: 2019-06-21 12:55:00

How can I stop the dataframe from converting my datetime object to UTC? I tried removing the '-04' from the end of the datetime object as reflected above, but no luck.

Edit: Using df['begin_ts'] = df['begin_ts'].dt.tz_convert('US/Eastern') solved my problem

Follow Up Question:

How can I tell the dataframe to not use scientific notation?

1 Answers1

0

One option is to try setting the parse_dates parameter with a format string to prevent from setting the timezone.

On the other hand, after obtaining the dataframe you could just remove the timezone with:

df['begin_ts'] = df['begin_ts'].dt.tz_localize(None)

Hope this helps!

89f3a1c
  • 1,430
  • 1
  • 14
  • 24
  • @89f3a1c- For some reason that didn't change anything. The dataframe time is still ahead of the actual variable by 4 hours. Even though the variable doesn't have timezone element, it's of the format '%H:%M:%S'. –  Jun 21 '19 at 17:27
  • I can't find a way to not add the timezone when reading. But later when adjusting the timezone, instead of None you could try setting some of [this values](https://stackoverflow.com/questions/13866926/is-there-a-list-of-pytz-timezones) to "move back" times to the real values. – 89f3a1c Jun 21 '19 at 17:50
  • Great! If my answer helped to resolve your issue, please mark it as the accepted answer. As of the follow-up question, it really should be posted separately since the original question addressed time-related data which has nothing to do with the follow-up question. Btw, I'd recommend mentioning what output exactly you expect to have if scientific notation is not what you want. – 89f3a1c Jun 21 '19 at 18:43