I have the following data. I have over 100k records so it's a big file and I'm only showing a portion of it.
import pandas as pd df1 = pd.DataFrame(data) print (df1)
ADDRESS | ID | DATE | VIOLATIONS
0 7738 S WESTERN | CHI065 | 2014-07-08 | 65
1 1111 N HUMBOLDT| CHI010 | 2014-07-16 | 56
2 5520 S WESTERN | CHI069 | 2014-07-08 | 10
3 1111 N HUMBOLDT| CHI010 | 2014-07-26 | 101
4 1111 N HUMBOLDT| CHI010 | 2014-07-27 | 92
5 5529 S WESTERN | CHI068 | 2014-08-03 | 20
Q1. I need to figure out the average number of violations issued per camera, per day? Q2. on which day of the week are the most citations issued? Q3 Has the number of active cameras increased or decreased over the collection period.
I'm still stuck on the first one. I'm able to get avg of violations by date. The output looks like the following
df1.groupby('DATE').VIOLATIONS.mean()
DATE |
2014-07-01 | 52.168421
2014-07-02 | 43.228261
2014-07-03 | 51.617021
2014-07-04 | 59.596774
2014-07-05 | 55.380952
2014-07-06 | 59.983333
2014-07-07 | 49.237113
but when I changed it by adding ID it gives me error.
df1.groupby(['DATE', 'ID']).VIOLATIONS.mean()
Help would be much appreciate it! Thanks!