-3

This seems like a very simple thing but I can´t make it. I have panda frame like this http://prntscr.com/ko8lyd and I now want to plot one column on X-axis and another column on Y-axis. Here is what i try

import matplotlib.pyplot as plt
x = ATR_7
y = Vysledek
plt.scatter(x,y)
plt.show()

the is the error i am getting

<ipython-input-116-5ead5868ec87> in <module>()
      1 import matplotlib.pyplot as plt
----> 2 x = ATR_7
  3 y = Vysledek
  4 plt.scatter(x,y)
  5 plt.show()

where am I going wrong?

famargar
  • 3,258
  • 6
  • 28
  • 44

5 Answers5

1

You just need:

df.plot.scatter('ATR_7','Vysledek')

Where df is the name of your dataframe. There's no need to use matplotlib.

smartse
  • 1,026
  • 7
  • 12
0

You are trying to use undefined variables. ATR_7 is a name of a column inside your dataframe, it is not known to the rest of the world.

Try something like:

df.plot.scatter(x='ATR_7', y='Vysledek')

assuming your dataframe name is df

igrinis
  • 12,398
  • 20
  • 45
0

If you want to use matplotlib then you need to make your x and y values a list then pass to plt.scatter

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
%matplotlib inline

x = list(df['ATR_7']) # set x axis by creating a list
y = list(df['Vysledek']) # set y axis by creating a list
plt.scatter(x,y)
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
  • well, this looks promising but it doesn't work for me. I must be really dumb but still, I am getting an error of not defined name. I don't know how to put the code here, but here it is quite clear http://prntscr.com/ko96jl – Nirvikalpa Samadhi Aug 29 '18 at 13:34
  • @NirvikalpaSamadhi it looks like the cell where you concat your dataframes was run at 103 and the plot was run at 11. Just re-run your cells in order and it should work. I am guessing the kernel was reset and the results variable is no longer defined – It_is_Chris Aug 29 '18 at 13:51
0

It seems there were two issues in your code. First, the names of the columns were not in quotes, so python has no way of knowing those are strings (column names are strings). Second, the easiest way to plot variables using pandas is to use pandas functions. You are trying to plot a scatter plot using matplotlib (that takes as input an array, not just a column name).

First, let's load modules and create the data

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

d = {'ATR_7' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 
     'Vysledek' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

Then, you can either use pandas plotting as in

x = 'ATR_7'
y = 'Vysledek'
df.plot.scatter(x,y)

Or plain-old matplotlib plotting as in

x = df['ATR_7']
y = df['Vysledek']    
plt.scatter(x,y)
famargar
  • 3,258
  • 6
  • 28
  • 44
0

Scatter does not know which data to use. You need to provide it with the data.

x = "ATR_7"
y = "Vysledek"
plt.scatter(x,y, data=df)

under the assumption that df is your dataframe and has columns named "ATR_7" and "Vysledek".

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712