2

I come across this issue constantly; and my current solution is to create additional dataframes, I feel like there must be an easier solution.

Here is an example of data where I have multiple countries with multiple attributes: enter image description here

If I wanted to plot Population vs. Depression (%) I would write:

ax = df.plot.scatter(x='Population', y='Depression (%)') enter image description here

This isn't super helpful, as there are clearly lines linked to specific Countries (df['Country']). Is there a simple way to plot a scatter plot with different series (colors/shapes/etc) as different Countries?

Right now I use groupby to separate out individual Countries and plot them on the same axes (ax = ax).

Any thoughts or input would be greatly appreciated! Thank you!

  • Of course: you `groupby` country and plot the groups in sequence. – Prune Feb 08 '21 at 23:05
  • Thanks Prune! I've used groupby in this case to create new seperate dataframes. Having n # of dataframes isn't super efficient. Is there something I'm missing? – plindner332 Feb 08 '21 at 23:10
  • I have no idea. Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). Show where the intermediate results differ from what you expected. We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Off-site links and images of text are not acceptable, in keeping with this site's purpose. – Prune Feb 08 '21 at 23:12
  • Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. – Prune Feb 08 '21 at 23:13

1 Answers1

1

Try c="Country" and then if you want some nice colors you can go colormap='viridis' for example documentation

ax2 = df.plot.scatter(x='length',
                      y='width',
                      c='species',
                      colormap='viridis')

enter image description here

Since you are using strings as variables we can't use this approach directly and need to convert the data to numbers. This can be done by writing: c=df.country.astype("category").cat.codes

Simon
  • 43
  • 8
  • Hi Simon, thanks for the response! This doesn't work, the reason is the value c= needs to be a color. One possible solution is to make a new column called Colors and then have it be redefined as a different color based on the Country. This is a lot of work and I'm being lazy! :D – plindner332 Feb 08 '21 at 23:09
  • 1
    I just tried it and it works it says in the documentation that it will use a colormap if a column is provided. As seen in example 2 in the documentation :) – Simon Feb 08 '21 at 23:09
  • Maybe I'm missing something, here is the code I'm writing: ax = df.plot.scatter(x='Population', y='Depression (%)', c='Country') and here is the error i'm getting: ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['Afghanistan' 'Afghanistan' 'Afghanistan' ... 'Global Baseline' 'Global Baseline' 'Global Baseline'] – plindner332 Feb 08 '21 at 23:13
  • Ahh I see the disconnect! Yes, this works when the column you're specifying as c is a number. I'm trying to use it as a category, in this case Country. Best I can figure out is to create unique DataFrames, but i'm sure theres a better solution out there. – plindner332 Feb 08 '21 at 23:17
  • 1
    Yes! One workaround would be to use Label Encoding (converting each category to a number) but I think there must be some way to use c on categories – Simon Feb 08 '21 at 23:22
  • 1
    Okay I found a way and edited my original post :D – Simon Feb 08 '21 at 23:30
  • Awesome! This works perfect for what I need, thank you!!! I'm going to work a bit on relabeling the color bar to identify the categories. Thanks again!!! – plindner332 Feb 08 '21 at 23:48