0

I'm working on a dataframe that i have been able to clean by running the following codes in separate cells in jupyter notebook. However, I need to run these same tasks on several dataframes that are organized exactly the same. How can i write a function that can execute the tasks 2 through 4 below?

For reference, the date I'm working with is located here.

[1]: df1 = pd.read_csv('202110-divvy-tripdata.csv')

[2]: df1.drop(columns=['start_station_name','start_station_id','end_station_name','end_station_id','start_lat','start_lng','end_lat','end_lng'],inplace=True)

[3]: df1['ride_length'] = pd.to_datetime(df1.ended_at) - pd.to_datetime(df1.started_at)

[4]: df1['day_of_week'] = pd.to_datetime(df1.started_at).dt.day_name()

BannyM
  • 212
  • 2
  • 8
  • What have you tried, and what went wrong with your attempts? To start, define a function that takes a dataframe as an input, does the things, and returns the new dataframe as an output. What specifically is tripping you up? – G. Anderson Oct 26 '22 at 17:15
  • 1
    You can define a function directly in jupyter notebook just like you can do in regular python script. Just define the function in a cell and run the cell once. The function will remain defined as long as you don't restart the kernel or clear notebook state. – Michael Sohnen Oct 26 '22 at 17:15
  • You could create a list containing the entries which are csv files' filenames, then write a simple for loop that calls the generator which in turn is the function and pass the filename as the parameter to the function. – Priya Oct 26 '22 at 17:17

1 Answers1

1

You can define a function in a cell in Jupyter, run this cell and then call the function:

def process_df(df):
    df['ride_length'] = pd.to_datetime(df.ended_at) - pd.to_datetime(df.started_at)
    df['day_of_week'] = pd.to_datetime(df.started_at).dt.day_name()

Call the function with each DataFrame:

df1 = pd.read_csv('data1.csv')
df2 = pd.read_csv('data2.csv')

process_df(df1)
process_df(df2)

According to this answer, both DataFrames will be altered in place and there's no need to return a new object from the function.

dor132
  • 183
  • 1
  • 1
  • 8