2

I have to generate a series of scatter plots (roughly 100 in total).

I have created an example to illustrate the problem.

First do an import.

import pandas as pd

Create a pandas dataframe.

 # Create dataframe
data = {'name': ['Jason', 'Jason', 'Tina', 'Tina', 'Tina', 'Jason', 'Tina'],
        'report_value': [4, 24, 31, 2, 3, 5, 10],
        'coverage_id': ['m1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7']}
df = pd.DataFrame(data)
print(df)

Output:

  coverage_id   name  report_value
0          m1  Jason             4
1          m2  Jason            24
2          m3   Tina            31
3          m4   Tina             2
4          m5   Tina             3
5          m6  Jason             5
6          m7   Tina            10

The goal is generate two scatter plots without using a for-loop. The name of the person, Jason or Tina, should be displayed in the title. The report_value should be on the y-axis in both plots and the coverage_id (which is a string) on the x-axis.

I thought I should start with:

df.groupby('name')

Then I need to apply the operation to every group.

This way I have the dataframe grouped by their names. I don't know how to proceed and get Python to make the two plots for me.

Thanks a lot for any help.

Marnix
  • 719
  • 1
  • 6
  • 14

1 Answers1

1

I think you can use this solution, but first is necessary convert string column to numeric, plot and last set xlabels:

import matplotlib.pyplot as plt

u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i

groups = df.groupby('name')

# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
    ax.plot(group.coverage_id, 
            group.report_value, 
            marker='o', 
            linestyle='', 
            ms=12, 
            label=name)

ax.set(xticks=range(len(i)), xticklabels=u)
ax.legend()

plt.show()

Another seaborn solution with seaborn.pairplot:

import seaborn as sns

u, i = np.unique(df.coverage_id, return_inverse=True)
df.coverage_id = i

g=sns.pairplot(x_vars=["coverage_id"], y_vars=["report_value"], data=df, hue="name", size=5)
g.set(xticklabels=u, xlim=(0, None))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252