0

i need help, i have a csv file with the following columns:

    Date    Tipology    inputDates  dayOfWeek
0   2018-01-01  200 2018-01-01  Monday
1   2018-01-02  93  2018-01-02  Tuesday
2   2018-01-03  382 2018-01-03  Wednesday
3   2018-01-04  147 2018-01-04  Thursday
4   2018-01-05  107 2018-01-05  Friday
... ... ... ... ...
360 2018-12-27  155 2018-12-27  Thursday
361 2018-12-28  148 2018-12-28  Friday
362 2018-12-29  129 2018-12-29  Saturday
363 2018-12-30  129 2018-12-30  Sunday
364 2018-12-31  147 2018-12-31  Monday

I would like to sum tipology by dayOfWeek, I'm doing:

groupweek = df1.groupby(['dayOfWeek','Tipology']).count()
groupweek

and I receive

                     Date   inputDates
dayOfWeek   Tipology        
Friday           107    1   1
                 113    1   1
                 117    1   1
                 118    1   1
                 119    1   1
.........................
Monday           104    1   1
                 111    1   1
                 113    1   1
                 118    1   1
..........................
 etc. etc

in theory I thought that adding up all the types of Friday Monday etc ect I obtained how many types (of numerical sum occurred per day of the week) but this does not happen, so I'm not sure that by doing this below I get what I want :?

in:

groupweek = df1.groupby(['dayOfWeek'],as_index=False)['Tipology'].sum()
groupweek

out:

dayOfWeek   Tipology
0   Friday      8356
1   Monday      9245
2   Saturday    8685
3   Sunday      8489
4   Thursday    8629
5   Tuesday     8959
6   Wednesday   9273

Are the numeric tipology values grouped and summed based on the dayofweek for the year 2018?

With count() (there should be all 52 Fridays, 52 Mondays etc etc) adding them does not give the result of sum().

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
scofx
  • 149
  • 12
  • 1
    I'm sorry, it's not really an opinion, but a request. I have a problem I'm trying to solve! in the meantime, thanks for the answer! – scofx Nov 18 '20 at 21:20
  • what if i wanted to do just the opposite? that is to see which day of the week of the year has more typology, to present it graphically, how could I do without count ()? – scofx Nov 18 '20 at 21:27
  • Use `sns.barplot(x='dayOfWeek', y='Tipology', data=df1, estimator=sum)` may be what you want or `sns.countplot`. – Trenton McKinney Nov 18 '20 at 21:33
  • perfect thanks .. but i wanted to get there via groupby, i'm using plotly Histogram2dContour ! – scofx Nov 18 '20 at 21:39

1 Answers1

2

Your first implementation:

groupweek = df1.groupby(['dayOfWeek','Tipology']).count()

Equivalent sql:

select count(Date), count(inputDates) from df1 group by dayOfWeek, Tipology

You used 2 columns in groupBy here: 'dayOfWeek' and 'Tipology' This made unique row for ('dayOfWeek','Tipology') combination.

Instead you wanted uniqueness over dayOfWeek only. So, removing column 'Tipology' from group by columns will do the trick. This will give you all 52s and 53s.

groupweek = df1.groupby(['dayOfWeek'])['Tipology'].count()

Equivalent sql:

select count(Tipology) from df1 group by dayOfWeek

Your second implementation is correct. Tipology is grouped by daysOfWeek and then aggregated by sum function.

groupweek = df1.groupby(['dayOfWeek'],as_index=False)['Tipology'].sum()

Equivalent sql:

select sum(Tipology) from df1 group by dayOfWeek
Ajay
  • 130
  • 6
  • thanks for the answer, as I commented above: what if i wanted to do just the opposite? that is to see which day of the week of the year has more typology, to present it graphically, how could I do without count ()? – scofx Nov 18 '20 at 21:40
  • count only gives row count. It will just give number of Mondays or Tuesdays. How are you defining more topology? As using sum, you get aggregate more typology. Are you seeking max instead? – Ajay Nov 18 '20 at 21:45
  • in fact I would need (example of Friday) every Friday of the year 2018 with every Friday the daily value of the tipology (obviously for every day of the week) – scofx Nov 18 '20 at 22:14
  • You want filtering then. This should help: df.tipology[df['day'] == 'Friday'] Link: https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values – Ajay Nov 18 '20 at 22:54