1

so I'm trying to create a linear regression for the variable China in r and then plot it but I don't know what to exactly type in to get a linear regression? The dataset I'm using is the csv file from this website https://data.worldbank.org/indicator/NY.GDP.PCAP.KD

Hope you guys can help :)

I don't have anything so far in r yet. So far I'm thinking of trying some type of example of

y= Bo + B1 ? Or something along the line of x <- c(0, 1, 2, 3, 4) and then y <- c (0, 1, 2, 3 ,4) And then some kind of straight line as in abilene to prove what I'm typing is correct? I want the x axis of this graph to be the year and the y axis to be gdp

jay.sf
  • 60,139
  • 8
  • 53
  • 110

2 Answers2

2
data <- read_excel("yourExcelFile.xls")
plot(x,y)
#it will print scatter plot
#add line using abline
abline(lm(x~y))
0

This is rather easy, we use gdp.CHN as a vector, by simply unlisting it. Next we regress it on the time, i.e. a seq_along the values (gives [1, 2, 3, ...]) and put it in abline to add it to the plot.

gdp_chn <- unlist(gdp.CHN)

matplot(t(gdp.CHN), type='l')
abline(lm(gdp_chn ~ seq_along(gdp_chn)), col='red')

enter image description here

However, obviously the relationship with time isn't linear, we could use polynoms instead (apparently they are significant until the third polynom),

cf <- lm(gdp_chn ~ poly(seq_along(gdp_chn), 3, raw=TRUE))$coe

and add it as a curve.

curve(cf[1] + cf[2]*x + cf[3]*x^2 + cf[4]*x^3, from=0, to=length(gdp_chn), add=TRUE, col='blue')
legend('topleft', lty=1, col=c('red', 'blue'), legend=c('linear', 'polynomial'))

enter image description here


Data:

gdp.CHN <- structure(list(X1960 = 238.217064539596, X1961 = 175.023690648895, 
    X1962 = 163.907052423586, X1963 = 176.400465038858, X1964 = 203.687844894515, 
    X1965 = 232.607219042772, X1966 = 250.304915833897, X1967 = 229.876286172258, 
    X1968 = 214.770077253607, X1969 = 244.363977154436, X1970 = 283.585371215779, 
    X1971 = 295.380186485794, X1972 = 299.190903908794, X1973 = 315.129680031066, 
    X1974 = 315.816680593065, X1975 = 337.344136734593, X1976 = 326.949477684525, 
    X1977 = 346.939174229985, X1978 = 381.099349491894, X1979 = 404.596653779982, 
    X1980 = 430.855432409241, X1981 = 447.119809916529, X1982 = 480.311347634892, 
    X1983 = 524.409390725924, X1984 = 596.201140363961, X1985 = 667.128576945448, 
    X1986 = 716.10537758984, X1987 = 786.864922900416, X1988 = 861.193531864752, 
    X1989 = 883.764201558456, X1990 = 905.032504707292, X1991 = 975.462966767608, 
    X1992 = 1100.64617367513, X1993 = 1239.12943830846, X1994 = 1384.93023003783, 
    X1995 = 1520.02954923879, X1996 = 1653.43387572699, X1997 = 1787.76707089802, 
    X1998 = 1909.62242871681, X1999 = 2038.20658108471, X2000 = 2193.89698162019, 
    X2001 = 2359.572509032, X2002 = 2557.89174634267, X2003 = 2797.17680600598, 
    X2004 = 3061.83333342101, X2005 = 3390.716337509, X2006 = 3800.7659955099, 
    X2007 = 4319.03162430702, X2008 = 4711.64369663046, X2009 = 5128.90439706266, 
    X2010 = 5647.06902367691, X2011 = 6152.6971958122, X2012 = 6591.66284020379, 
    X2013 = 7056.42346206424, X2014 = 7532.78569686615, X2015 = 8016.44601585644, 
    X2016 = 8516.52918957832, X2017 = 9053.22920002159, X2018 = 9619.20998022803, 
    X2019 = 10155.5114163468, X2020 = 10358.1705411086, X2021 = 11223.1533663847, 
    X2022 = 11560.3301997039), row.names = "CHN", class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Hi Jay! Thank you so much for your help again!! I’m seeing that you’ve typed out all the different values for the years in your data set and I’m just wondering if there’s something you can type into r to make it do that for you automatically or did you have to type in the different values for the years manually? – Cassie Tran Jul 09 '23 at 09:53
  • @CassieTran I actually just used `dput(gdp.CHN)` from the values of your [other question](https://stackoverflow.com/a/76646461/6574038) to produce this, no typing :) – jay.sf Jul 09 '23 at 09:57
  • Thanks again! But also I’m wondering if what you’ve just typed out could be considered a linear regression because I thought that to be a linear regression the format have to be y = Mx + B ? Sorry I’m doing this for a class assignment and the professor is being sticky about this having to be a linear regression thing! – Cassie Tran Jul 09 '23 at 09:59
  • @CassieTran Good question. If you give the `abline` function an `lm` output, it recognizes that it is a regression and will add both intercept B and slope M. It actually only works with linear regression, i.e. one independent variable (IV), though, that's why we use `curve` for more IVs. – jay.sf Jul 09 '23 at 10:05
  • Hi, so I've just copy and pasted into r what you wrote above and it's giving me the error code of "plot.new has not been called yet". I've just also linking my r data codes on this picture so you can have a look for me! :) https://imgur.com/a/zug83i1 – Cassie Tran Jul 09 '23 at 10:10
  • Also, if you can include the codes for Cuba, Benin and the People's Republic of Congo to render the codes to look something like the curved line that goes through for China it'd be super helpful! I'm talking about the part from cf <- lm(gdp_chn ~ poly(seq_along(gdp_chn), 3, raw=TRUE))$coe and onwards! Thanks! – Cassie Tran Jul 09 '23 at 10:14
  • @CassieTran Are you sure, you used the right data, I edited one time. For the other countries proceed like for CHN. To get `gdp.CHN` proceed as you described in your [comment](https://stackoverflow.com/questions/76644769/how-to-plot-points-across-time-when-the-years-are-different-columns/76646461#comment135132860_76646461). – jay.sf Jul 09 '23 at 10:17
  • Yes I'm positive that I used the right data. I just copied and pasted everything that you typed in again but it's still giving me the same error code for some reason :/ and I'm not trying to get gdp.CHN like I've previously said in my comment, I want to get the codes that you've typed into r to get the curved line like you did for the China plot, but for Cuba, Benin and the People's Republic of Congo instead this time around ! I've also included pictures of what my r codes look like currently! https://imgur.com/a/dalwpvW – Cassie Tran Jul 09 '23 at 10:22
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/254409/discussion-between-cassie-tran-and-jay-sf). – Cassie Tran Jul 09 '23 at 10:26
  • @CassieTran Strange, it works actually fine for me, just copy-paste and it works. Maybe restart your R. Also I see from your image you're using Rmarkdown, maybe try oce in a normal script. – jay.sf Jul 09 '23 at 10:26
  • what do you mean try using a normal script? Should I change my template to .rdata then instead of using rmd? – Cassie Tran Jul 09 '23 at 10:27
  • @CassieTran File > New File > R Script opens what I consider a normal script. – jay.sf Jul 09 '23 at 10:30
  • I just tried that and it looks like that fixed the problem ! thanks so much! :) also if you can just provide me with the codes for the curved lines for the other 3 countries I would very much appreciate it! – Cassie Tran Jul 09 '23 at 10:33
  • Also how do I knit this new r script file into html? Because the reason why I was using r markdown is because it gave me the option to knit on save but now I only see the source on save option for this new r script file? – Cassie Tran Jul 09 '23 at 10:36
  • @CassieTran Glad you could solve it. You will figure that out how to get the polynomial models for the other countries. It is important to understand the code you are using and not just copy-pasteing it. It will be a good practice (it's not very hard). On the problem why your rmd didn't work, I have no clue you could post another question. Please make it reproducible, so others can just copy paste (incl. data) and reproduce the problem. Also see https://stackoverflow.com/help/minimal-reproducible-example. – jay.sf Jul 09 '23 at 10:40
  • @CassieTran BTW please consider to [accept answers](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work/5235#5235) you get on Stack Overflow. Cheers! – jay.sf Jul 09 '23 at 10:41
  • Hi @jay.sf , I've just copied and pasted your code "curve(cf[1] + cf[2]*x + cf[3]*x^2 + cf[4]*x^3, from=0, to=length(gdp_chn), add=TRUE, col='blue')" and it's giving me the error code of object "'cf' not found"? Also I've just accepted your answer :) – Cassie Tran Jul 09 '23 at 11:14