1

Find below the code which is calculating the total number of subscribers, customers and other customers in each city from the excel files, and also calculating the average time of their trips in each city. Is there any way to simplify the If, elif statements inside the for loop in my code below?

new_file = {'Washington': './data/Washington-2016-Summary.csv',
         'Chicago': './data/Chicago-2016-Summary.csv',
         'NYC': './data/NYC-2016-Summary.csv'}

for city, filename in new_file.items():

    with open (filename, 'r') as fil_1:
    t_subscriber = 0
    t_customers = 0
    cnt_subscribers = 0
    cnt_customers = 0
    other_customers = 0
    file_reader = csv.DictReader(fil_1)

    for row in data_reader:
        if row['user_type'] == 'Subscriber':
            cnt_subscribers += 1
            t_subscribers += float(row['duration'])
        elif row['user_type'] == 'Customer':
            cnt_customers += 1
            t_customers += float(row['duration'])
        elif row['user_type'] == '':
            other_customers += 1
            t_customers += float(row['duration'])

    tripaverage_duration = (t_subscribers+t_customers)/60)/(cnt_subscribers+cnt_customers+other_customers)
    tripaverage_subscribers = (t_subscribers/60)/cnt_subscribers
    tripaverage_subscribers = (t_customers/60)/cnt_customers

    print ('Average trip duration in', city,'-' 
    ,tripaverage_duration,'minutes')
    print ('Average trip duration for subscribers in', city,'-' 
    ,tripaverage_subscribers,'minutes')
    print ('Average trip duration for customers in', city,'-' 
    ,tripaverage_subscribers,'minutes')
    print ('\n')
Godfrey
  • 87
  • 1
  • 8
  • 1
    Possible duplicate of [Multiple conditions with if/elif statements](https://stackoverflow.com/questions/12335382/multiple-conditions-with-if-elif-statements). Particularly the second example in the answer is perfect for your case, you would just do `if row['user_type'] in ('Subscriber','Customer','')`, and then you could even add more options easily if your program later needs it. – Davy M Feb 21 '18 at 18:47
  • Why does this if statement exist at all? You are always doing the same thing so, just remove it. – MegaIng Feb 21 '18 at 18:52
  • @MegaIng How will you suggest otherwise? – Godfrey Feb 21 '18 at 19:26

2 Answers2

0

I recommend Pandas dataframes for something like this. You can easily subset dataframes based on values in another column, and sum the values, count the numbers, etc. Here's an example of how you could apply this to your problem:

import pandas as pd    
new_file = {'Washington': './data/Washington-2016-Summary.csv',
             'Chicago': './data/Chicago-2016-Summary.csv',
             'NYC': './data/NYC-2016-Summary.csv'}

for city, filename in new_file.items():

    data = pd.read_csv(filename)
    tripaverage_duration = data.values.mean()['duration']
    tripaverage_subscribers = data[data['user_type']=='Subscriber'].values.mean()['duration']
    tripaverage_customers = data[data['user_type']=='Customer'].values.mean()['duration']


print ('Average trip duration in', city,'-' 
    ,tripaverage_duration,'minutes')
    print ('Average trip duration for subscribers in', city,'-' 
    ,tripaverage_subscribers,'minutes')
    print ('Average trip duration for customers in', city,'-' 
    ,tripaverage_subscribers,'minutes')
    print ('\n')
Sevy
  • 688
  • 4
  • 11
  • This was the error message i received for the above suggestion, TypeError Traceback (most recent call last) in () ---> 12 tripaverage_duration = data.values.mean()['duration'] /opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims) ---> 70 ret = umr_sum(arr, axis, dtype, out, keepdims) TypeError: unsupported operand type(s) for +: 'float' and 'str' – Godfrey Feb 21 '18 at 19:07
  • I think this error is related to the data type in your dataframe - if some elements are string (or NaN) then it's unable to compute the mean – Sevy Feb 21 '18 at 19:48
0

One option is to use list comprehensions like this:

cnt_subscribers = sum([1 for row in data_reader if row['user_type'] == 'Subscriber'])
t_subscribers = sum([float(row['duration']) for row in data_reader if row['user_type'] == 'Subscriber'])
Riley Martine
  • 193
  • 2
  • 9