4

I have the following DataFrame

| name | number | value | 
|------|--------|-------| 
| a    | 1      | 13    | 
| a    | 2      | 18    | 
| a    | 3      | 54    | 
| b    | 1      | 1     | 
| c    | 1      | 135   | 
| c    | 2      | 153   | 
| c    | 3      | 512   | 
| d    | 1      | 36    | 
| d    | 2      | 74    | 
| d    | 3      | 209   | 
| e    | 1      | 108   | 
| e    | 2      | 150   | 
| e    | 3      | 339   | 
| f    | 1      | 27    | 
| f    | 2      | 41    | 
| f    | 3      | 177   | 
| g    | 1      | 102   | 
| g    | 2      | 102   | 
| g    | 3      | 360   | 
| h    | 1      | 1     | 
| i    | 1      | 1     | 

And I wish to do 2 things...

  1. For any row in the name column that only appears once, I wish to remove it from the table, so that my output would be that rows 'b', 'h' and 'i' are removed.

  2. I then wish to make a line graph, where the number is on the x axis and the name is on the y axis, with the lines going across being the values, I've done a rough illustration to show what I mean (each line would be a different colour to correspond to the name)

enter image description here

2 Answers2

2

You are asking for quite a lot of formatting. But here is a simple example:

import io
import pandas as pd
import matplotlib.pyplot as plt

string = u"""number,name,value
a,1,13
a,2,15
a,3,18
b,1,1
c,1,17
c,2,21
"""

df = pd.read_csv(io.StringIO(string))

# Remove uniques with boolean indexing
df = df[df.duplicated('number',keep=False)]

#https://stackoverflow.com/questions/41494942/pandas-dataframe-groupby-plot
df.set_index('name', inplace=True)
df.groupby('number')['value'].plot(legend=True)

plt.show()

enter image description here

Anton vBR
  • 18,287
  • 5
  • 40
  • 46
0

Pivot the DataFrame and plot

df[['number', 'value']] = df[['number', 'value']].astype(int)
name_cnt = df.groupby('name').size()
required_nm = name_cnt[ name_cnt != 1].index
required_rows = df.loc[df.name.isin(required_nm)]  # select non repeating row in 'name' columns

required_rows.pivot(columns='name', index='number', values='value').plot()
shanmuga
  • 4,329
  • 2
  • 21
  • 35