-1

I know this is very simple but I don't truly know how to do it as I don't usually plot and it's giving me headache. I have two particular problems:

1º Let's imagine I just have an column of a dataframe (or let's say, and array) with a categorical(object) variable and I would like to make a Bar Chart with the number of observations that have the different labels of that chart. For example if I have that dataframe with this column named color I would like something like this.


    COLOR
0   green
1   red
2   green
3   yellow
4   pink
5   red
6   blue

First Question

2º The second issue that I have is that I have a dataset with a large number of rows and a few labels, and I'm interested in two of them. I want to plot a Bar Chart (in any of the two ways seem below) that plots me the number of rows that satifies belonging to a particular class of the first label divided by the class of the second label. For example if in the Group A there's 160 rows and each of them 40 if also for Series 1, I want it plotted like below.

         Group  Series
0   Group A Series 2    
1   Group B Series 1    
2   Group B Series 5    
3   Group A Series 4    
0   Group A Series 1    
1   Group B Series 3    
2   Group B Series 3    
3   Group A Series 2

Second Question

3º I would also like to know if is there any fuction that given two labels tells me the percentaje of each of labels ofthe second columns, belongs to any of the first one. Like in the second question but instead of visually I want it numerically. The output would be something like

Group A : 23% Series 1, 15% Series 2, 11% Series 3...
Group B: 27% Series 1, 11% Series 2, 10% Series 3...

I know there must be direct fuctions to that but as I'm not used to the data/plotting part of python I don't know them and I'm strugling to find them.

Floralys
  • 1
  • 1
  • Please, provide more info about your code, your data and your approach, as the questions are not fully clear to me. For what I understood, the first problem is only a matter of colors (https://www.tutorialkart.com/matplotlib-tutorial/matplotlib-pyplot-bar-plot-different-colors-for-bars/), the second one can be easily solved using Seaborn "hue" (https://seaborn.pydata.org/generated/seaborn.barplot.html) and the third one is a group_by (https://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby) – FrancecoMartino Jan 03 '23 at 09:56
  • Hello, thank you for your help. But for the first question I would like a fuction that given a categorical label plots the number (counts) of observation with all the classes of that label. For example if there are 30 rows which has green in it's color label column, then the bar plot must show the green class with a height of 30 – Floralys Jan 03 '23 at 11:53
  • You also have images there as links – Floralys Jan 03 '23 at 12:10
  • Ok, understood. In this case, please check: https://stackoverflow.com/questions/57417970/how-to-set-custom-colors-on-a-count-plot-in-seaborn This should solve your first question. – FrancecoMartino Jan 03 '23 at 12:46

1 Answers1

0

For your first question, you can use the seaborn function countplot. That function works specifically for categorical values like your color labels.

seaborn's function countplot

For your second question, you can use pandas function crosstab:

group = ['Group A', 'Group B', 'Group B', 'Group A', 'Group A', 'Group B', 'Group B', 'Group A']

series = ['Series 2', 'Series 1', 'Series 5', 'Series 4', 'Series 1', 'Series 3', 'Series 3', 'Series 2'] df = pd.DataFrame(list(zip(group, series)), columns=['Group', 'Series'])

pd.crosstab(df['Group'], df['Series'], normalize=True)

crosstab can give you the desired percentages