How to create a scatter plot using Pandas, with specific data from a column, and not all of the data in a column

Question

I am currently using

df.plot.scatter(x='Ice_cream_sales', y='Temperature')

However, I want to be able to only use the ice cream sales that equal to $5, and the temperatures that are precisely at 90 degrees.

How would I go about using the specific values that I'm interested in, vice the entire column worth of data?

What is the issue, exactly? Have you tried anything, done any research? How about filtering the values, for example? Stack Overflow is not a free code writing service. See: [tour], [ask], [help/on-topic], https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users. — AMC, Mar 09 '20 at 23:47

score 2 · Accepted Answer · answered Mar 09 '20 at 22:42

The easiest way to do this is to create a dataframe of the subset of values you are interested in.

Say you have a dataframe df with columns 'Ice_cream_sales','Temperature'

import pandas as pd
import matplotlib.pyplot as plt

# Here we subset your dataframe where the temperature is 90, which will give you a 
# boolean array for your dataframe.
temp_90 = df['Temperature'] == 90

# Apply your boolean against your dataframe to grab the correct rows:
df2 = df[temp_90]

# Now plot your scatter plot
plt.scatter(x=df2['ice_cream_sales'] y=df2['Temperature'])
plt.show()

I'm not sure why you would want to plot a scatter plot where sales are = $5 and temperature = 90. That would give you exactly one data point.

Instead you can subset using an inequality:

high_temp = df['Temperature'] >= 90

Also be careful that you do not apply subsets on both of your variables, otherwise you would be falsifying whatever relationship you are attempting to show with your scatter plot.

I appreciate the help and the improvements on the original code. One question, I want to use date_time in a scatter plot, but my date_time column is currently displaying as such: 2020-01-31 16:19:12Z. I tried 'import numpy as np' followed by 'df['date_time'].astype('M')' however, this didn't appear to permanently change the data type, and it also gave an error when attempting to plot. In that, plot.scatter requires numerical values. It did not recognize the data in the date_time column as a numerical value. — Tweep, Mar 11 '20 at 00:55
Hi Tweep, check out this thread on converting strings to datetime objects: https://stackoverflow.com/questions/38256750/make-a-scatter-plot-in-matplotlib-with-dates-on-x-axis-and-values-on-y You can read this thread here for assistance extracting 'Month' from your datetime dataframe column: https://stackoverflow.com/questions/25146121/extracting-just-month-and-year-separately-from-pandas-datetime-column — Henru, Mar 11 '20 at 19:03

How to create a scatter plot using Pandas, with specific data from a column, and not all of the data in a column

1 Answers1