-1

I have a csv file that has three columns, one called (Age_Groups), one called (Trip_in_min) and the third is called (Start_Station_Name), (actually it comes from a bigger dataset (17 rows and 16845 columns)

Now I need to get the average trip time per age group

Here is the link to the csv file, in dropbox, as I did not know how to paste it properly here

Any help please?

import pandas as pd
file = pd.read_csv(r"file.csv")
# Counting total minutes per age group
trips_summary = (file.Age_Groups.value_counts())
print(("Number of trips per age group"))
print(trips_summary)# per age group
print()

# Finding the most popular 20 stations
popular_stations = (file.Start_Station_Name.value_counts())
print("The most popular 20 stations")
print(popular_stations[:20])
print()

UPDATE

Ok, it worked, I added the line

df.groupby('Age_Groups', as_index=False)['Trip_in_min'].mean()

Thanks @jjj, however as I mentioned, my data has more than 16K row, once I added back the rows, it started to fail and gives me the error below (might be not a real error), with only age groups and not average printed, I can get it only if I have 1890 rows or less, here is the message I am getting for larger number of rows (BTW), other operations work fine with the full DS, just this one):

*D:\Test 1.py:18: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function. avg = df.groupby('Age_Groups', as_index=False)['Trip_in_min'].mean()

Age_Groups* 0 18-24 1 25-34 2 35-44 3 45-54 4 55-64 5 65-74 6 75+

UPDATE 2

Not all columns are numbers, however when I use the code below:

df.apply(pd.to_numeric, errors='ignore').info()

I get the below output(my target is number 12)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1897 entries, 1 to 1897
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Riverview Park      11 non-null     object 
 1   Riverview Park.1    11 non-null     object 
 2   Riverview Park.2    11 non-null     object 
 3   Start_Station_Name  1897 non-null   object 
 4   3251                98 non-null     float64
 5   Jersey & 3rd        98 non-null     object 
 6   24443               98 non-null     float64
 7   Subscriber          98 non-null     object 
 8   1928                98 non-null     float64
 9   Unnamed: 9          79 non-null     float64
 10  Age_Groups          1897 non-null   object 
 11  136                 98 non-null     float64
 12  Trip_in_min         1897 non-null   object 
dtypes: float64(5), object(8)
memory usage: 192.8+ KB
Ali
  • 37
  • 1
  • 5
  • Please have a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [edit] your question to include a sample of your input data and expected output to make a [mcve]. In this case based on the description, it sounds like you want a [groupby](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html) with `.mean()` – G. Anderson Jan 26 '23 at 19:47

1 Answers1

0

Hope this helps:

import pandas as pd
df= pd.read_csv("test.csv")
df.groupby('Age_Groups', as_index=False)['Trip_in_min'].mean()
jansary
  • 38
  • 6
  • it worked for a limited time, I will add the details in the main body of the question due to characters limit here, but thanks anyway – Ali Jan 26 '23 at 20:49
  • try this: df.groupby("Age_Groups").Trip_in_min.mean().sort_values(ascending=False). Also makes sure all your data is numerical. – jansary Jan 26 '23 at 21:25
  • ok, there not all are numerical, so I used the (df.apply(pd.to_numeric, errors='ignore').info()) but not getting the right result, I will post the output in the main post under update 2 – Ali Jan 26 '23 at 22:32