2

I have some values in a pandas df that are positive and negative ints, and I want to convert them to timedeltas so I can put them into a DurationField in a Django model.

             date  dep_time dep_delay  arr_time arr_delay cancelled carrier  \
103992 2014-05-11  10:13:00        -2  12:47:00       -13         0      B6   
103993 2014-05-11  19:29:00        -1  22:15:00       -24         0      B6   
103994 2014-05-11  11:17:00         5  13:55:00         9         0      B6   
103995 2014-05-11  07:36:00       -10  09:24:00       -18         0      B6   
103996 2014-05-11  13:40:00         0  16:47:00        10         0      B6   

       tailnum flight origin dest air_time distance duration  
103992  N630JB    925    JFK  TPA      137     1005     1013  
103993  N632JB    225    JFK  TPA      137     1005     1929  
103994  N635JB    127    EWR  MCO      126      937     1117  
103995  N637JB   1273    JFK  CHS       92      636     0736  
103996  N637JB    213    JFK  LGB      352     2465     1340  

With this data, I want to express dep_delay, arr_delay, air_time and duration as timedeltas, but I keep getting zeroed-out values? I'm using

data['air_time'] = pd.to_timedelta(data['air_time'], errors='coerce')
Davtho1983
  • 3,827
  • 8
  • 54
  • 105

1 Answers1

3

If you are getting all 00:00:00.000000 values, then your air_time values might be strings. (You can check the data type of the air_time column by inspecting data.info(). If the dtype says object then the values are Python objects (such as strs) instead of a NumPy integer data type. You can then confirm they are strings by inspecting set(map(type, data['air_time'])).)

If they are strings, you can convert them to ints first by using:

data['air_time'] = data['air_time'].astype(int)

If 137 means 137 minutes then use

data['air_time'] = pd.to_timedelta(data['air_time'], unit='m', errors='coerce')

If, on the other hand, 137 means 1 hour and 37 minutes, then use

data['air_time'] = pd.to_timedelta(
    (data['air_time']//100)*60 + (data['air_time'] % 100), unit='m', 
    errors='coerce')

The unit='m' argument tells pd.to_timedelta to interpret the values as minutes.

For example,

import pandas as pd

data = pd.DataFrame({'air_time':['137','137','126','92','352']})
data['air_time'] = data['air_time'].astype(int)
data['air_time'] = pd.to_timedelta(data['air_time'], unit='m', errors='coerce')

yields

  air_time
0 02:17:00
1 02:17:00
2 02:06:00
3 01:32:00
4 05:52:00

Note that pd.to_timedelta can also accepts strings as input if the strings contain the desired units. For example,

import pandas as pd

data = pd.DataFrame({'air_time':['137','137','126','92','352']})
data['air_time'] = data['air_time'] + ' minutes'
#       air_time
# 0  137 minutes
# 1  137 minutes
# 2  126 minutes
# 3   92 minutes
# 4  352 minutes

data['air_time'] = pd.to_timedelta(data['air_time'], errors='coerce')

yields the same result.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I'm still getting 00:00:00.000000 in my air-time col with data['air_time'] = pd.to_timedelta(data['air_time'], unit='m', errors='coerce') ? – Davtho1983 May 04 '18 at 14:00
  • 1
    Ah, then your values are probably strings. Use `df['air_time'] = `df['air_time'].astype(int)` to convert the strings to ints first. – unutbu May 04 '18 at 14:01
  • With my minus values I'm still getting -1 days +23:58:00 which is correct, but a bit confusing - can I make it format it as just - minutes? – Davtho1983 May 04 '18 at 14:15
  • 1
    Unfortunately, once you make the values `timedelta64`s, you lose control over the way the values are displayed (but you gain computational benefits). So for computation, use `timedelta64`s, but for display purposes, [convert (back) to strings](https://stackoverflow.com/q/538666/190597). – unutbu May 04 '18 at 14:26
  • Cool - I guess I'll handle it in my Django forms if I need to – Davtho1983 May 04 '18 at 14:34
  • 1
    Or, to avoid having to convert timedeltas back into strings/ints, you could keep `air_time` as it is (for display purposes), and assign the timedeltas to a new column: `data['air_timedelta'] = pd.to_timedelta(...)`. – unutbu May 05 '18 at 09:24