1

I have a pandas timeframe that has a column in this time format: e.g. 05:00:00 (hour, minute, seconds). These values are extracted off my SQL database via pandas with the following query function

df = pd.read_sql("SELECT .....'" \
, con=mariadb_connection)

I would like to convert these time values into minutes (or seconds) as I would like to perform linear regression on the values which are my time duration variable. How can I go about doing that? Is there a way I can convert these values (e.g. multiply 60 to 14 of 14:30:00 and add to 30, to get the total duration in minutes i.e 870) into duration values without the : and then place them back into the same rows and columns of the dataframe for further analysis? Currently, I cannot work with the current time format as scikit linear regression function does not accept this format into one of the function's argument.

Example dataframe:

    voltage valueA  Duration  valueB
0   12.45   0.86    14:04:30   9.157145
1   12.52   0.61    14:02:32  10.010095
2   12.24   0.17    14:00:33  10.010095
Craver2000
  • 433
  • 1
  • 7
  • 24
  • Please read about [how to ask a question](https://stackoverflow.com/help/how-to-ask) (particularly [how to create a good example](https://stackoverflow.com/help/mcve)) in order to get good responses. In your case an example dataframe and expected output would go a long way. – Alex Feb 03 '18 at 17:18
  • Is your DB column a string or an actual TIME type? – Jon Clements Feb 03 '18 at 17:21
  • 1
    Possible duplicate of [How to convert a Python datetime object to seconds](https://stackoverflow.com/questions/7852855/how-to-convert-a-python-datetime-object-to-seconds) – serkef Feb 03 '18 at 17:21

1 Answers1

2

Demo:

In [143]: df
Out[143]:
       time
0  14:30:00
1  16:27:29

In [144]: df.dtypes
Out[144]:
time    object
dtype: object

In [145]: df['seconds'] = pd.to_timedelta(df['time']).dt.seconds

In [146]: df
Out[146]:
       time  seconds
0  14:30:00    52200
1  16:27:29    59249

In [147]: df.dtypes
Out[147]:
time       object
seconds     int64
dtype: object

UPDATE: @JonClements has proposed another interesting idea to convert time (if Duration column has time data type on the MariaDB side) to seconds on the SQL side:

df = pd.read_sql("select voltage, valuea, valueb, time_to_sec(duration) as duration from some_table" ,
                  con=mariadb_connection)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • 2
    Possibly this or let the DB handle it if the OP has an actual TIME type and not a string on the server, eg - change the query to `select voltage, valuea, valueb, time_to_sec(duration) as duration from some_table`... – Jon Clements Feb 03 '18 at 17:26
  • @JonClements, Pandas doesn't have a corresponding dtype - on the Pandas side it can be either `datetime` or `object` (string)... – MaxU - stand with Ukraine Feb 03 '18 at 17:27
  • From what I remember - `time_to_sec` returns an integer.... which is a usable type (the same as you'll get from `.dt.seconds` - depends where you want the lifting to go... seems it makes some sense to just get the DB to do it before retrieving it...) – Jon Clements Feb 03 '18 at 17:28
  • @JonClements, yeah, now i've noticed - `mariadb_connection` (which is a MySQL fork), so your SQL should work. First I was confused, because AFAIK `time_to_sec` - doesn't belong to ANSI SQL Stabdard... – MaxU - stand with Ukraine Feb 03 '18 at 17:30
  • Just a possible idea... but since we don't know if it's a string or an actual time - it might well be not possible... just seems a plausible option but not worth me posting an answer regarding it if you want to cover both bases there... – Jon Clements Feb 03 '18 at 17:31
  • @Jon Clements: I've just checked with `SELECT DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'mytable' AND COLUMN_NAME = 'column1';` It appears to be a time data type. – Craver2000 Feb 03 '18 at 17:36
  • Modified my query as suggested above to `...column name(duration) as duration from mytable...` but got a `(42000): FUNCTION mytable.column1 does not exist` error – Craver2000 Feb 03 '18 at 17:38
  • @MaxU: Your method to convert to seconds work. I tried to see if this could work for minutes as well with `df['minutes'] = pd.to_timedelta(df['column1']).dt.minutes` but dont think that function allows for minutes `AttributeError: 'TimedeltaProperties' object has no attribute 'minutes'` Thanks anyways, might use seconds instead. – Craver2000 Feb 03 '18 at 17:40
  • @Craver2000, for minutes use: `pd.to_timedelta(df['time']).dt.seconds//60` – MaxU - stand with Ukraine Feb 03 '18 at 17:41
  • @Craver2000 it appears your "Duration" has a capital D though... SQL can be a little funny like that on how it does it - can't remember the exact rules.... you might want to escape the column such as `time_to_sec(`\`Duration`\`) as duration` or similar... Check it against your statement using the `show create table table_name` statement... (PS - I haven't used mysql/variants in ages - always been more a postgresql puppy - so - include a pinch of salt - just stuff off the top of my head that I think I might remember correctly :p) – Jon Clements Feb 03 '18 at 17:44
  • @JonClements:Got it! My initial query was incorrect, I used my column name instead of `time_to_sec`. I verified that yours and MaxU's methods work. I suppose the SQL method is most direct since I do not have to delete out the initial column. The duration column retains its position in the dataframe. But the other method might be useful if minutes are needed for duration perhaps, or if an additional column is desired. – Craver2000 Feb 03 '18 at 17:53