I'm under the similar condition with this case. I'm working on a project which has a large dataframe with about half-million of rows. And about 2000 of users are involving in this.( I get this number by value_counts()
counting a column called NoUsager
).
I'd like to split the dataframe into several array/dataframe for plotting after. (Several means an array/dataframe for each user) I gott the list of users like:
df.sort_values(by='NoUsager',inplace=True)
df.set_index(keys=['NoUsager'],drop=False,inplace=True)
users = df['NoUsager'].unique().tolist()
I know what's after is a loop to generate the smaller dataframes but I have no idea how to make it happen. And I combined the code above and tried the one in the case but there was no solution for it.
What should I do with it?
EDIT
I want both histogram and boxplot of the dataframe. With the answer provided, I already have a boxplot of all NoUsager
. But with large amount of data, the boxplot is too small to read. So I'd like to split the dataframe by NoUsager
and plot them separately.
Diagrams that I'd like to have:
- boxplot, column=
DureeService
, by=NoUsager
- boxplot, column=
DureeService
, by='Weekday` - histogram, for every
Weekday
,by=DureeService
I hope this time is well explained.
DataType:
Weekday NoUsager Periods Sens DureeService
DataType string string string string datetime.time
Sample of DataFrame:
Weekday NoUsager Periods Sens DureeService
Lun 000001 Matin + 00:00:05
Lun 000001 Matin + 00:00:04
Mer 000001 Matin + 00:00:07
Dim 000001 Soir - 00:00:02
Lun 000001 Matin + 00:00:07
Jeu 000001 Soir - 00:00:04
Lun 000001 Matin + 00:00:07
Lun 000001 Soir - 00:00:04
Dim 000001 Matin + 00:00:05
Lun 000001 Matin + 00:00:03
Mer 000001 Matin + 00:00:04
Ven 000001 Soir - 00:00:03
Mar 000001 Matin + 00:00:03
Lun 000001 Soir - 00:00:04
Lun 000001 Matin + 00:00:04
Mer 000002 Soir - 00:00:04
Jeu 000003 Matin + 00:00:50
Mer 000003 Soir - 00:06:51
Mer 000003 Soir - 00:00:08
Mer 000003 Soir - 00:00:10
Jeu 000003 Matin + 00:12:35
Lun 000004 Matin + 00:00:05
Dim 000004 Matin + 00:00:05
Lun 000004 Matin + 00:00:05
Lun 000004 Matin + 00:00:05
And what bothers me is that none of these data is number, so each time they have to be converted.
Thanks in advance!