0

I am beginner in programming, so I don't understand a lot of Python, I have a code that takes information from a CSV file:

23;1;42.8
21;1;....

So what I thought was that the information is easier to manage in vectors, so I made this code:

import csv

with open("city_traffic.csv") as file_csv:
    csv_reader = csv.reader(file_csv, delimiter=';')

    cities=[]
    for line in csv_reader: 
        new_list=[]
        i=int(line[0])
        k=int(line[1])
        j=float(line[2])
        new_list.append(k)
        new_list.append(i)
        new_list.append(j)      

        cities.append(new_list)

    for s in cities:
        print("City: "+str(s[0])+ ". Total Amount of Traffic: "+str(s[2])+ ". Rush Hour: "+str(s[1]))

And the output is like:

City: 1. Total Amount of Traffic: 42.8. Rush Hour: 23
City: 1. Total Amount of Traffic: 89.1. Rush Hour: 21
City: 4. Total Amount of Traffic: 60.5. Rush Hour: 2
City: 4. Total Amount of Traffic: 50.6. Rush Hour: 10
City: 3. Total Amount of Traffico: 44.2. Rush Hour: 10

My question, is there a way that I can categorize or sort the info so I can add the total amount of traffic of each city and then show which hour is the one with the most traffic, for example:

City: 1 Total Amount of Traffic: 131.9 Rush Hour: 21
City: 4 Total Amount of Traffic: 111.1 Rush Hour: 2
City: 3 Total Amount of Traffic: 44.2 Rush Hour: 10

As you can see I don't have any code for this last part, I have been struggling with this issue, but I would appreciate any advices on how to do it, or maybe improving my code. Thank you.

JorgeJG98
  • 5
  • 2

2 Answers2

0

You should check out pandas Dataframes. They're great for dealing with tabular data. As for the exact operation you would want to use grouby and apply functions as described at the end of this answer.

kubatucka
  • 555
  • 5
  • 15
0

You should use pandas for this. It has many useful functions which don't need to use for-loop

At start you can read it and add column names in one line of code (if you don't count import)

import pandas as pd

df = pd.read_csv('city_traffic.csv', sep=';', names=['Rush', 'City', 'Traffic'])

And you can display it

print(df)

Result:

   Rush  City  Traffic
0    23     1     42.8
1    21     1     89.1
2     2     4     60.5
3    10     4     50.6
4    10     3     44.2

It has also functions to display only some columns or rows

print(df[ df['City'] == 1 ])

Result:

   Rush  City  Traffic
0    23     1     42.8
1    21     1     89.1

Or if you need to use for-loop

for index, row in df.iterrows():
    print(f"City: {row['City']}. Total Amount of Traffic: {row['Traffic']}. Rush Hour: {row['Rush']}")

Result:

City: 1.0. Total Amount of Traffic: 42.8. Rush Hour: 23.0
City: 1.0. Total Amount of Traffic: 89.1. Rush Hour: 21.0
City: 4.0. Total Amount of Traffic: 60.5. Rush Hour: 2.0
City: 4.0. Total Amount of Traffic: 50.6. Rush Hour: 10.0
City: 3.0. Total Amount of Traffic: 44.2. Rush Hour: 10.0

Using pandas you can group by City and sum Traffic

groups = df.groupby('City')

print(groups['Traffic'].sum())

Result:

City
1    131.9
3     44.2
4    111.1
Name: Traffic, dtype: float64

In groups for different columns you can run different functions: sum for Traffic and min for Rush

new_df = groups.agg({'Traffic': 'sum', 'Rush': 'min'})
new_df = new_df.reset_index()

print(new_df)

Result:

   City  Traffic  Rush
0     1    131.9    21
1     3     44.2    10
2     4    111.1     2

Minimal working code .

I use io.StringIO in read_csv() only to simulate file in memory but you should use read_csv('city_traffic.csv', ...)

text ='''23;1;42.8
21;1;89.1
2;4;60.5
10;4;50.6
10;3;44.2'''

import pandas as pd
import io

#df = pd.read_csv('city_traffic.csv', sep=';', names=['Rush', 'City', 'Traffic'])
df = pd.read_csv(io.StringIO(text), sep=';', names=['Rush', 'City', 'Traffic'])

print(df)
print('---')

print(df[ df['City'] == 1 ])
print('---')

for index, row in df.iterrows():
    print(f"City: {row['City']}. Total Amount of Traffic: {row['Traffic']}. Rush Hour: {row['Rush']}")
print('---')


groups = df.groupby('City')

print(groups['Traffic'].sum())
print('---')


new_df = groups.agg({'Traffic': 'sum', 'Rush': 'min'})
new_df = new_df.reset_index()
print(new_df)
print('---')

#new_df['City'] = new_df['City'].replace({1:'Berlin', 4:'Paris', 3:'Roma'})
new_df['City'] = ['Berlin', 'Paris', 'Roma']
print(new_df)
print('---')

for index, row in new_df.iterrows():
    print(f"City: {row['City']:6} | Total Amount of Traffic: {row['Traffic']:6.2f} | Rush Hour: {row['Rush']:2}")
print('---')

Result:

   Rush  City  Traffic
0    23     1     42.8
1    21     1     89.1
2     2     4     60.5
3    10     4     50.6
4    10     3     44.2
---
   Rush  City  Traffic
0    23     1     42.8
1    21     1     89.1
---
City: 1.0. Total Amount of Traffic: 42.8. Rush Hour: 23.0
City: 1.0. Total Amount of Traffic: 89.1. Rush Hour: 21.0
City: 4.0. Total Amount of Traffic: 60.5. Rush Hour: 2.0
City: 4.0. Total Amount of Traffic: 50.6. Rush Hour: 10.0
City: 3.0. Total Amount of Traffic: 44.2. Rush Hour: 10.0
---
City
1    131.9
3     44.2
4    111.1
Name: Traffic, dtype: float64
---
   City  Traffic  Rush
0     1    131.9    21
1     3     44.2    10
2     4    111.1     2
---
     City  Traffic  Rush
0  Berlin    131.9    21
1   Paris     44.2    10
2    Roma    111.1     2
---
City: Berlin | Total Amount of Traffic: 131.90 | Rush Hour: 21
City: Paris  | Total Amount of Traffic:  44.20 | Rush Hour: 10
City: Roma   | Total Amount of Traffic: 111.10 | Rush Hour:  2
---
furas
  • 134,197
  • 12
  • 106
  • 148