I have a csv file that has three columns, one called (Age_Groups), one called (Trip_in_min) and the third is called (Start_Station_Name), (actually it comes from a bigger dataset (17 rows and 16845 columns)
Now I need to get the average trip time per age group
Here is the link to the csv file, in dropbox, as I did not know how to paste it properly here
Any help please?
import pandas as pd
file = pd.read_csv(r"file.csv")
# Counting total minutes per age group
trips_summary = (file.Age_Groups.value_counts())
print(("Number of trips per age group"))
print(trips_summary)# per age group
print()
# Finding the most popular 20 stations
popular_stations = (file.Start_Station_Name.value_counts())
print("The most popular 20 stations")
print(popular_stations[:20])
print()
UPDATE
Ok, it worked, I added the line
df.groupby('Age_Groups', as_index=False)['Trip_in_min'].mean()
Thanks @jjj, however as I mentioned, my data has more than 16K row, once I added back the rows, it started to fail and gives me the error below (might be not a real error), with only age groups and not average printed, I can get it only if I have 1890 rows or less, here is the message I am getting for larger number of rows (BTW), other operations work fine with the full DS, just this one):
*D:\Test 1.py:18: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function. avg = df.groupby('Age_Groups', as_index=False)['Trip_in_min'].mean()
Age_Groups* 0 18-24 1 25-34 2 35-44 3 45-54 4 55-64 5 65-74 6 75+
UPDATE 2
Not all columns are numbers, however when I use the code below:
df.apply(pd.to_numeric, errors='ignore').info()
I get the below output(my target is number 12)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1897 entries, 1 to 1897
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Riverview Park 11 non-null object
1 Riverview Park.1 11 non-null object
2 Riverview Park.2 11 non-null object
3 Start_Station_Name 1897 non-null object
4 3251 98 non-null float64
5 Jersey & 3rd 98 non-null object
6 24443 98 non-null float64
7 Subscriber 98 non-null object
8 1928 98 non-null float64
9 Unnamed: 9 79 non-null float64
10 Age_Groups 1897 non-null object
11 136 98 non-null float64
12 Trip_in_min 1897 non-null object
dtypes: float64(5), object(8)
memory usage: 192.8+ KB