3

I am using python and plotly in order to design a bar plot for the mean rating of certain categories in the data set I am using. I got the bar chart nearly how I want it however I would like to change the color for each specific bar in the plot but can't seem to find a clear way on how to do this online.

Data Set

from pandas import Timestamp
pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
              
 'overall_rating': {0: 5, 1: 4, 2: 5, 3: 5, 4: 4},
 'user_name': {0: 'member1365952',
  1: 'member465943',
  2: 'member665924',
  3: 'member865886',
  4: 'member1065873'},
 'date': {0: Timestamp('2022-02-03 00:00:00'),
  1: Timestamp('2022-02-03 00:00:00'),
  2: Timestamp('2022-02-02 00:00:00'),
  3: Timestamp('2022-02-01 00:00:00'),
  4: Timestamp('2022-02-01 00:00:00')},
 'comments': {0: 'Great campus. Library is always helpful. Sport course has been brill despite Civid challenges.',
  1: 'Average facilities and student Union. Great careers support.',
  2: 'Brilliant university, very social place with great unions.',
  3: 'Overall it was very good and the tables and chairs for discussion sessions worked very well',
  4: 'Uni is nice and most of the staff are amazing. Facilities (particularly the library) could be better'},
 'campus_facilities_rating': {0: 5, 1: 3, 2: 5, 3: 4, 4: 4},
 'clubs_societies_rating': {0: 5, 1: 3, 2: 4, 3: 4, 4: 4},
 'students_union_rating': {0: 4, 1: 3, 2: 5, 3: 5, 4: 5},
 'careers_service_rating': {0: 5, 1: 5, 2: 5, 3: 5, 4: 3},
 'wifi_rating': {0: 5, 1: 5, 2: 5, 3: 5, 4: 3}})

Code Used

# Plot to find mean rating for different categories
fig = px.bar(df, y=[df.campus_facilities_rating.mean(), df.clubs_societies_rating.mean(),
                    df.students_union_rating.mean(), df.careers_service_rating.mean(), df.wifi_rating.mean()],
                x=['Campus Facilities', 'Clubs & Societies', 'Students Union', 'Careers & Services', 'Wifi'],
                labels={
                    "y": "Mean Rating (1-5)",
                    "x": "Category"},
                title="Mean Rating For Different Student Categories")

fig.show()

UPDATED ATTEMPT

# Plot to find mean rating for different categories
fig = px.bar(df, y=[df.campus_facilities_rating.mean(), df.clubs_societies_rating.mean(),
                    df.students_union_rating.mean(), df.careers_service_rating.mean(), df.wifi_rating.mean()],
                x=['Campus Facilities', 'Clubs & Societies', 'Students Union', 'Careers & Services', 'Wifi'],
                labels={
                    "y": "Mean Rating (1-5)",
                    "x": "Category"},
                title="Mean Rating For Different Student Categories At The University of Lincoln",
                color_discrete_map = {
                    'Campus Facilities' : 'red',
                    'Clubs & Societies' : 'blue',
                    'Students Union' : 'pink',
                    'Careers & Services' : 'grey',
                    'Wifi' : 'orange'})

fig.update_layout(barmode = 'group')

fig.show()

Output just gives all bars as blue.

  • Can you specify how you intend to map colors to bars? Do you have a list that contains the color for each bar in this plot? Do you have cutoff bar values for different colors? Should the bar value be represented by a color? Should groups be represented by a color? [Here are several of these possibilities already covered.](https://stackoverflow.com/a/61902566/8881141) – Mr. T Feb 05 '22 at 18:48
  • @Mr.T I basically want to choose what colour i assign each x value, so e.g set Campus Facilities to Red, set Students Union to Blue etc. How would do this. – patricebailey1998 Feb 05 '22 at 19:06
  • 1
    Please don't post images of code/data/error messages. Post the text directly here on SO. Nobody wants to type text from an image. – Mr. T Feb 05 '22 at 20:10

1 Answers1

6

In general, you can use color_discrete_map in px.bar() to specify the color of each bar if you've defined a category such as color="medal" like this:

color_discrete_map={'gold':'yellow', 'silver':'grey', 'bronze':'brown'}

Plot:

enter image description here

Complete code for general approach with data sample:

import plotly.express as px

long_df = px.data.medals_long()

fig = px.bar(long_df, x="nation", y="count", color="medal", title="color_discrete_map={'gold':'yellow', 'silver':'grey', 'bronze':'brown'}",
            color_discrete_map={'gold':'yellow', 'silver':'grey', 'bronze':'brown'})

fig.update_layout(barmode = 'group')

fig.show()

Edit after OP provided data sample

In the case of your particular dataset and structure, you can't directly apply color='category since the different categories are spread accross several columns like this:

enter image description here

There's one way reach your goal using go.Figure() and fig.add_traces(), but since you seem most interested in px.bar(), we'll stick to plotly.express. In short go.Figure() would require no particular data wrangling to get what you want, but setting up the figure would be a bit more messy. When it comes to plotly.express and px.bar, the exact opposite is true. And once we've made some changes to your dataset, all you need to build the figure below is the following snippet:

fig = px.bar(dfg, x = 'category', y = 'value',
             color = 'category',
             category_orders = {'category':['Campus Facilities','Clubs & Societies','Students Union','Careers & Services','Wifi']},
             color_discrete_map = {'Campus Facilities' : 'red',
                                    'Clubs & Societies' : 'blue',
                                    'Students Union' : 'pink',
                                    'Careers & Services' : 'grey',
                                    'Wifi' : 'orange'})

enter image description here

Complete code with all data wrangling steps:

from pandas import Timestamp
import plotly.express as px
import pandas as pd
df = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
              
 'overall_rating': {0: 5, 1: 4, 2: 5, 3: 5, 4: 4},
 'user_name': {0: 'member1365952',
  1: 'member465943',
  2: 'member665924',
  3: 'member865886',
  4: 'member1065873'},
 'date': {0: Timestamp('2022-02-03 00:00:00'),
  1: Timestamp('2022-02-03 00:00:00'),
  2: Timestamp('2022-02-02 00:00:00'),
  3: Timestamp('2022-02-01 00:00:00'),
  4: Timestamp('2022-02-01 00:00:00')},
 'comments': {0: 'Great campus. Library is always helpful. Sport course has been brill despite Civid challenges.',
  1: 'Average facilities and student Union. Great careers support.',
  2: 'Brilliant university, very social place with great unions.',
  3: 'Overall it was very good and the tables and chairs for discussion sessions worked very well',
  4: 'Uni is nice and most of the staff are amazing. Facilities (particularly the library) could be better'},
 'campus_facilities_rating': {0: 5, 1: 3, 2: 5, 3: 4, 4: 4},
 'clubs_societies_rating': {0: 5, 1: 3, 2: 4, 3: 4, 4: 4},
 'students_union_rating': {0: 4, 1: 3, 2: 5, 3: 5, 4: 5},
 'careers_service_rating': {0: 5, 1: 5, 2: 5, 3: 5, 4: 3},
 'wifi_rating': {0: 5, 1: 5, 2: 5, 3: 5, 4: 3}})

df.columns = ['id', 'overall_rating', 'user_name', 'date', 'comments', 'Campus Facilities',
              'Clubs & Societies','Students Union','Careers & Services','Wifi']

dfm = pd.melt(df, id_vars=['id', 'overall_rating', 'user_name', 'date', 'comments'],
              value_vars=list(df.columns[5:]),
              var_name ='category')

dfg = dfm.groupby(['category']).mean().reset_index()

fig = px.bar(dfg, x = 'category', y = 'value', color = 'category',
             category_orders = {'category':['Campus Facilities','Clubs & Societies','Students Union','Careers & Services','Wifi']},
             color_discrete_map = {
                    'Campus Facilities' : 'red',
                    'Clubs & Societies' : 'blue',
                    'Students Union' : 'pink',
                    'Careers & Services' : 'grey',
                    'Wifi' : 'orange'})

fig.update_yaxes(title = 'Mean rating (1-5)')
fig.show()

Appendix: Why dfm and dfg?

px.bar(color = 'variable') assigns colors to unique occurences of a series or a pandas column named 'variable'. But the categories we're interested in your dataframe are spread accross several columns. So what

dfm = pd.melt(df, id_vars=['id', 'overall_rating', 'user_name', 'date', 'comments'],
              value_vars=list(df.columns[5:]),
              var_name ='category')

does, is to take the following columns:

enter image description here

and stack them into one column named variable like this:

enter image description here

But that is still the raw data, and you're not interested in that, but rather the mean of each group in that same column. And that is what

dfm.groupby(['category']).mean().reset_index()

gives us:

enter image description here

Take a look at pd.melt() and df.groupby() for further details.

vestland
  • 55,229
  • 37
  • 187
  • 305
  • 1
    Can you link to `color_discrete_map` in the docs? When looking for something like a ListedColorMap or a keyword that takes a list, I always ended up [here](https://plotly.com/python/discrete-color/). – Mr. T Feb 05 '22 at 19:41
  • 1
    @Mr.T Sure! You'll find a little info on `color_discrete_map` on that exact page just a little bit down under [Directly Mapping Colors to Data Values](https://plotly.com/python/discrete-color/#directly-mapping-colors-to-data-values) – vestland Feb 05 '22 at 19:48
  • 1
    @Mr.T The Plotly docs have got a bunch of little goodies about color spread around all over the place. I tried to gather the most important of them in the post [Plotly: How to define colors in a figure using plotly.graph_objects and plotly.express?](https://stackoverflow.com/questions/63460213/plotly-how-to-define-colors-in-a-figure-using-plotly-graph-objects-and-plotly-e/63460218#63460218) – vestland Feb 05 '22 at 19:49
  • 1
    Darn it. I was the entire time [on this page](https://plotly.com/python/colorscales/), and now that I was looking for the link, I accidentally found the right one but didn't read it. Thanks. – Mr. T Feb 05 '22 at 19:52
  • @Mr.T You're welcome! That happens all the time with me too =) – vestland Feb 05 '22 at 19:55
  • @vestland Hello, I tried using your example but I am just getting all bars returned as blue for some reason no errors, I put the code i attempted in the question (UPDATED ATTEMPTED) can you take a look please? there is also a example of dataframe data I have. – patricebailey1998 Feb 05 '22 at 20:10
  • @patricebailey1998 Please don't share date samples as images, but [like this](https://stackoverflow.com/questions/63163251/pandas-how-to-easily-share-a-sample-dataframe-using-df-to-dict/63163254#63163254) instead – vestland Feb 05 '22 at 20:13
  • 1
    @vestland I have updated the dataset could you take a look now please – patricebailey1998 Feb 05 '22 at 20:21
  • @patricebailey1998 Yeah, I see what's happening. I'll look into it. – vestland Feb 05 '22 at 20:29
  • @patricebailey1998 I've included a custom solution to your dataset. Don't hesitate to let me know if it wasn't quite what you had in mind! – vestland Feb 05 '22 at 21:09
  • @vestland Thanks for the quick reply, could you please just quickly explain what the dfm and dfg parts of code are doing please, so I am not just blindly copying. Thank you – patricebailey1998 Feb 05 '22 at 22:56
  • @patricebailey1998 I've added a little info about that in an `Appendix`. I hope that makes the whole thing a little clearer. – vestland Feb 05 '22 at 23:25
  • @patricebailey1998 You're welcome! Please consider marking my suggestion as the accepted answer. Upvotes are also always welcome if the answer was useful to you. – vestland Feb 06 '22 at 18:52