FInd a relationship in data using R

Question

I have a data

   df <- structure(list(salary = c(32368L, 53174L, 52722L, 53423L, 50602L, 
  49033L, 24395L, 24395L, 43124L, 23975L, 53174L, 58515L, 56294L, 
  49033L, 44884L, 53429L, 46574L, 58968L, 53174L, 53627L, 49033L, 
  54981L, 62530L, 27525L, 24395L, 56884L, 52111L, 44183L, 24967L, 
  35423L, 41188L, 27525L, 35018L, 44183L, 35423L), experience = c(3L, 
  10L, 10L, 1L, 5L, 10L, 5L, 6L, 8L, 4L, 4L, 8L, 10L, 10L, 1L, 
  5L, 8L, 10L, 5L, 10L, 5L, 7L, 10L, 3L, 5L, 10L, 5L, 5L, 6L, 4L, 
  2L, 3L, 1L, 2L, 1L)), .Names = c("salary", "experience"), class = "data.frame", row.names = c("1", 
  "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
  "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
  "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35"
  ))

that looks like this:

> head(df)
  salary experience
1  32368          3
2  53174         10
3  52722         10
4  53423          1
5  50602          5
6  49033         10

I need to find a statictic law, that can describe the relationship between salary and experience. I thought that it's a Quadratic reciprocity, but when I print Scatterplot I didn't see any relationship between this variables. scatterplot I think I can divide this data and try to see relationship. But I don't know, how can I do that.

Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — zx8754, Apr 05 '16 at 09:50

Vincent Bonhomme · Accepted Answer · 2016-04-05T10:42:34.820

0

Have you tried something? Something like a simple lm?

plot(experience~salary, df)
mod <- lm(experience~salary, df)
abline(mod)
summary(mod)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept) -1.904e-01  1.801e+00  -0.106  0.91646   
salary       1.346e-04  3.931e-05   3.424  0.00167 **

You can try other models with:

mod2 <- lm(experience ~ salary + I(salary^2), df)    
new_salary <- seq(min(df$salary), max(df$salary), length=50)    
pred_experience <- predict(mod2, newdata=data.frame(salary=new_salary))    
lines(new_salary, pred_experience)

edited Apr 05 '16 at 10:42

answered Apr 05 '16 at 09:54

Vincent Bonhomme

7,235
2
27
38

Is it linear relationship? – Apr 05 '16 at 10:00
you can try with `lm(experience ~ I(salary^2), df)` and `lm(experience ~ salary + I(salary^2), df)` but there is no evident/visual gain – Vincent Bonhomme Apr 05 '16 at 10:12
I don't know why, but I see the same graphics. It's strange – Apr 05 '16 at 10:34
`abline` is not happy when there are many coefficients. you may need to use `predict` and `lines` instead. eg: `plot(experience~salary, df) mod2 <- lm(experience ~ salary + I(salary^2), df) new_salary <- seq(min(df$salary), max(df$salary), length=50) pred_experience <- predict(mod2, newdata=data.frame(salary=new_salary)) lines(new_salary, pred_experience) ` – Vincent Bonhomme Apr 05 '16 at 10:36

FInd a relationship in data using R

1 Answers1