1

I have a dataset with the following format:

dataset1 = data.frame(
caliber = c("5000", "2500", "1250", "625", "312.5", "156", "80", "40", "20", "0"),
var1 = c(NA, NA, NA, 30458, 13740,11261, 9729, 5039, 3343, 367),
var2 = c(463000, 271903, 154611,87204, 47228, 28082, 14842, 8474, 5121, 1308),
var3 = c(308385, 184863, 89719, 48986, 27968, 18557, 9191, 5248, 3210, 703), 
var4 = c(290159, 149061, 64045, 36864, 19092, 12515, 6805, 3933, 2339, 574), 
var5 = c(270801, 163657, 51642, 48197, 23582, 14544, 7877, 4389, 2663, 482), 
var6 = c(NA, NA, NA, 37316, 21305, 11823, 5692, 3070, 1781, 363))

The best way to describe the relationship between the caliber and the other variables is by a 2-degree polynomial equation: var = poly(caliber, 2, raw=T)

enter image description here

My question is how I could use a new group of variables to identify the value of the caliber variable. As you can see below, I already have the results for each variable, but I need to identify the value of the caliber.

dataset2 = data.frame(
caliber = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
var1 = c(1120, 1296, 1132, 1280, 1096, 1124, 1004, 8384, 1072, 1104, 1568, 1044, 1108, 1012),
var2 = c(5044, 4924, 5088, 4804, 4824, 4844, 4964, 4788, 4804, 4964, 4824, 4788, 4844, 4944),
var3 = c(2836, 2744, 2744, 2668, 2688, 2940, 2756, 2720, 2668, 2892, 2636, 2700, 2836, 2668),
var4 = c(8872, 61580, 3036, 4468, 12132, 3000, 7920, 6868, 6896, 9392, 4728, 6896, 21076, 3228),
var5 = c(2312, 4236, 1928, 4448, 2388, 2108, 3644, 3060, 2168, 1912, 1812, 3528, 4100, 2176),
var6 = c(1156, 1228, 1224, 1364, 1128, 1176, 1184, 1640, 1188, 1300, 1332, 1176, 1176, 1152))

I am aware of a few previous threads on this topic, like

But none helped. Major issues were:

formula <- lm(var2~poly(caliber,2,raw=T), dataset1)
approx(x = formula$fitted, y = formula$caliber, xout = 0)$y

NA value for formula$caliber

mod<-lm(var2~poly(caliber, 2, raw=T), data=dataset1); summary(mod)
newdata=data.frame("var2"=dataset2[1:24,c("var2")])
pred<-predict(mod,newdata, type = 'response')

Error in poly(caliber, 2, coefs = list(alpha = c(998.35, 3691.21383929929 :object 'caliber' not found

unable to pass predict to another dataset

datasets with different rows

interpolation between X and Y gave wrong values

UseR10085
  • 7,120
  • 3
  • 24
  • 54
Henrique
  • 146
  • 7
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Do not post pictures of data because then we cannot copy the data in to R. You have found related questions which is good, but what code did you try exactly based on these previous questions. It's easier to help you if you show what code you've tried and you describe exactly how that code didn't work. – MrFlick Dec 23 '20 at 08:31
  • adjusted as peer requested – Henrique Dec 23 '20 at 08:54
  • You have calibrated or developed the model using `caliber` as the independent variable while `var2` was dependent variable. But your `newdata` does not have `caliber`. thats why you are getting the error. – UseR10085 Dec 23 '20 at 09:49
  • Yes, i know that. The problem is how to predict caliber of dataset2 using the polynomial regression model generated using dataset1. – Henrique Dec 23 '20 at 09:54
  • Under such a situation `caliber` should be the dependent variable and `var2` should be the independent variable. – UseR10085 Dec 23 '20 at 09:57

1 Answers1

2

As per the discussions, what I have understood, I am providing you the following solution

dataset1 = data.frame(
  caliber = c(5000, 2500, 1250, 625, 312.5, 156, 80, 40, 20, 0),
  var1 = c(NA, NA, NA, 30458, 13740,11261, 9729, 5039, 3343, 367),
  var2 = c(463000, 271903, 154611,87204, 47228, 28082, 14842, 8474, 5121, 1308),
  var3 = c(308385, 184863, 89719, 48986, 27968, 18557, 9191, 5248, 3210, 703), 
  var4 = c(290159, 149061, 64045, 36864, 19092, 12515, 6805, 3933, 2339, 574), 
  var5 = c(270801, 163657, 51642, 48197, 23582, 14544, 7877, 4389, 2663, 482), 
  var6 = c(NA, NA, NA, 37316, 21305, 11823, 5692, 3070, 1781, 363))

formula <- lm(caliber ~ poly(var2, degree = 2, raw=T), dataset1)

dataset2 = data.frame(
  caliber = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
  var1 = c(1120, 1296, 1132, 1280, 1096, 1124, 1004, 8384, 1072, 1104, 1568, 1044, 1108, 1012),
  var2 = c(5044, 4924, 5088, 4804, 4824, 4844, 4964, 4788, 4804, 4964, 4824, 4788, 4844, 4944),
  var3 = c(2836, 2744, 2744, 2668, 2688, 2940, 2756, 2720, 2668, 2892, 2636, 2700, 2836, 2668),
  var4 = c(8872, 61580, 3036, 4468, 12132, 3000, 7920, 6868, 6896, 9392, 4728, 6896, 21076, 3228),
  var5 = c(2312, 4236, 1928, 4448, 2388, 2108, 3644, 3060, 2168, 1912, 1812, 3528, 4100, 2176),
  var6 = c(1156, 1228, 1224, 1364, 1128, 1176, 1184, 1640, 1188, 1300, 1332, 1176, 1176, 1152))

predict(formula, dataset2, type = 'response')

The output from predict function will provide you with the values for caliber in dataset2.

I have corrected your dataset1. If you put the values within double quotes, it becomes character. So, I have removed the double quotes from caliber variable.

UseR10085
  • 7,120
  • 3
  • 24
  • 54
  • It worked, thanks. Caliber was numeric, but I still dont know what I was doing wrong. – Henrique Dec 23 '20 at 10:21
  • When you want to predict something, you have to provide the x i.e. independent variable. But your `dataset2` contains `caliber` NAs and your `newdata` does not contain `caliber`. Thats what your error says `object 'caliber' not found`. – UseR10085 Dec 23 '20 at 10:51