0

I am trying to create a scatter plot of earnings vs education for my statistical models class but it says "invalid character in identifier" but when I check on the txt file the characters "earnings" and "education" are both present. Could you help me please?

mod = smf.ols(formula=’education~earnings, data=mydata)
res = mod.fit()
res.summary()
beta=res.params 
matplotlib.pyplot.scatter(mydata["education"],mydata["earnings"],color="black") 
matplotlib.pyplot.plot(mydata["education"], res.fittedvalues, "r") 
matplotlib.pyplot.ylabel("earnings")
matplotlib.pyplot.xlabel("education") 
matplotlib.pyplot.title("Scatterplot earnings versus education") 
matplotlib.pyplot.show()
  • Hey Nesrine. That error is complaining about an invalid character in a variable name. If you post the traceback (the full error) it should be possible to see where that is. Also see this answer: https://stackoverflow.com/a/14844830/754456 – mfitzp Oct 09 '21 at 11:48
  • Hello @mfizp , the error is on education but when I check the txt file it's in there. Here's the full error message: File "/var/folders/vj/pxgh1vcx2zsf_cs2hld5dxvm0000gn/T/ipykernel_9043/3398938949.py", line 3 mod = smf.ols(formula=’education~earnings, data=mydata) ^ SyntaxError: invalid character in identifier – Nesrine Tiar Oct 09 '21 at 12:30

1 Answers1

0

I think the issue is the quotation mark after the = on this line:

mod = smf.ols(formula=’education~earnings, data=mydata)

This is confusing Python as it's not a valid variable name. The formula should be passed as a string, with a opening & closing single/double quote.

mod = smf.ols(formula='education~earnings', data=mydata)

Perhaps something got mixed up when copy-pasting it?

mfitzp
  • 15,275
  • 7
  • 50
  • 70