0

I'm doing a study on some flight data. it is supposed to be an explanatory analysis where some statistical methods like binning should be used. I'm stuck trying to format Departure and arrival time. So here is my code so far:

  #Calling Libraries
  import os               # File management
  import pandas as pd     # Data frame manipulation
  import numpy as np      # Data frame operations
  import datetime as dt   # Date operations
  import seaborn as sns   # Data Viz  

  #Reading the file:
  flight_df=pd.read_csv(r'C:\Users\pc\Desktop\Work\flights.csv')

  #Checking the DataFrame:
  flight_df.head()

  flight_df.info()
  <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 2500 entries, 0 to 2499
   Data columns (total 38 columns):
   #   Column               Non-Null Count  Dtype  
   ---  ------               --------------  -----  
   0   O_AIRPORT_IATA_CODE  2500 non-null   object 
   1   O_AIRPORT            2288 non-null   object 
   2   O_CITY               2288 non-null   object 
   3   O_STATE              2288 non-null   object 
   4   O_COUNTRY            2288 non-null   object 
   5   O_LATITUDE           2287 non-null   float64
   6   O_LONGITUDE          2287 non-null   float64
   7   D_AIRPORT_IATA_CODE  2500 non-null   object 
   8   D_AIRPORT            2288 non-null   object 
   9   D_CITY               2288 non-null   object 
   10  D_STATE              2288 non-null   object 
   11  D_COUNTRY            2288 non-null   object 
   12  D_LATITUDE           2288 non-null   float64
   13  D_LONGITUDE          2288 non-null   float64
   14  SCHEDULED_DEPARTURE  2500 non-null   int64  
   15  DEPARTURE_TIME       2467 non-null   float64
   16  DEPARTURE_DELAY      2467 non-null   float64
   17  TAXI_OUT             2467 non-null   float64
   18  WHEELS_OFF           2467 non-null   float64
   19  SCHEDULED_TIME       2500 non-null   int64  
   20  ELAPSED_TIME         2464 non-null   float64
   21  AIR_TIME             2464 non-null   float64
   22  DISTANCE             2500 non-null   int64  
   23  WHEELS_ON            2467 non-null   float64
   24  TAXI_IN              2467 non-null   float64
   25  SCHEDULED_ARRIVAL    2500 non-null   int64  
   26  ARRIVAL_TIME         2467 non-null   float64
   27  ARRIVAL_DELAY        2464 non-null   float64
   28  DIVERTED             2500 non-null   int64  
   29  CANCELLED            2500 non-null   int64  
   30  CANCELLATION_REASON  33 non-null     object 
   31  AIR_SYSTEM_DELAY     386 non-null    float64
   32  SECURITY_DELAY       386 non-null    float64
   33  AIRLINE_DELAY        386 non-null    float64
   34  LATE_AIRCRAFT_DELAY  386 non-null    float64
   35  WEATHER_DELAY        386 non-null    float64
   36  DATE                 2500 non-null   object 
   37  AIRLINE_NAME         2500 non-null   object 
   dtypes: float64(19), int64(6), object(13)
   memory usage: 742.3+ KB

# dropping redundant columns
newdf= flight_df.drop(['O_COUNTRY','O_LATITUDE','O_LONGITUDE','D_COUNTRY','D_LATITUDE','D_LONGITUDE','SCHEDULED_DEPARTURE','DIVERTED','CANCELLED','CANCELLATION_REASON','TAXI_OUT','TAXI_IN','WHEELS_OFF', 'WHEELS_ON','SCHEDULED_ARRIVAL'],axis=1, inplace = True) 

I need to change departure and arrival time format so instead of appearing like this:

12    1746.0
14    1849.0
19    1514.0
20    1555.0
22    2017.0
Name: DEPARTURE_TIME, dtype: float64

they appear like this:

   12    17:46
   14    18:49
   19    15:14
   20    15:55
   22    20:17

I need this to be able to do further binning and analysis

Thanks!

FObersteiner
  • 22,500
  • 8
  • 42
  • 72

1 Answers1

0

you can obtain the desired format by using pd.to_datetime to parse to datetime data type, then format to string:

import pandas as pd

df = pd.DataFrame({'DEPARTURE_TIME': [1746.0, 1849.0, 1514.0, 1555.0, 2017.0]})

df['DEPARTURE_TIME'] = pd.to_datetime(df['DEPARTURE_TIME'], format="%H%M").dt.strftime("%H:%M")

df['DEPARTURE_TIME']
0    17:46
1    18:49
2    15:14
3    15:55
4    20:17
Name: DEPARTURE_TIME, dtype: object
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • it works but i'm getting one of these pink warning boxes: C:\Users\pc\AppData\Local\Temp/ipykernel_9560/1146953015.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df2['DEPARTURE_TIME'] = pd.to_datetime(df2['DEPARTURE_TIME'], format="%H%M").dt.strftime("%H:%M") – Salma Dodin Apr 06 '22 at 12:29
  • @SalmaDodin that problem appears from time to time, see e.g. [this question](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas). If you want to get rid of it, try to set the new values with `.loc` as the warning message notes. – FObersteiner Apr 06 '22 at 12:32