0

I am trying to import the dataset into R to apply linear regression model, but am skeptical of my code as am new to R. The dataset is as follows with 5000+ rows of data:

power consumption cputi dbsu

as the column names and the followings integers as their values in the above column:

132 25 654

The sql code to call R function which I wrote is

CREATE COLUMN TABLE "PREDICTIVE ANALYSIS" LIKE "ANAGAPPAN.POWER_CONSUMPTION" WITH NO DATA;

SELECT POWER_APP, POWER_DB,CPUTI,DBTI,DBSU

FROM "ANAGAPPAN.POWER_CONSUMPTION";
DROP PROCEDURE USE_LM;

CREATE PROCEDURE USE_LM( IN train "ANAGAPPAN.POWER_CONSUMPTION", OUT result "PREDICTIVE ANALYSIS")

LANGUAGE

RLANG AS

BEGIN

library(lm)

model_app <- lm( POWER_APP ~ CPUTI + DBTI + DBSU + KBYTES_TRANSFERRED, data = train )

colnames(datOut) <- c("POWER_APP", "CPUTI", "DBTI", "DBSU", "DBSU")

PREDICTIVE ANALYSIS <- as.data.frame( lm(model_App))

END;

The result I obtain is it says the procedure is created but am unable to call the linear model on the data, how would I initiate the linear model?

Phil
  • 4,344
  • 2
  • 23
  • 33
Azhagu
  • 9
  • 1
  • 6
  • It's unclear but it sounds like you are able to enter your data in to R but you're having problems with the modelling? If this is the case you can drop the sql tag, and please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) we can work with – Phil Jun 04 '15 at 09:48
  • @Phil thanks for the proof reading too:) , am new to this forum was quite an informative edit. yes am able to import it now but am unable to call the sql command for linear regression on the data. – Azhagu Jun 04 '15 at 09:52
  • You're welcome. I don't know why you need to 'call the sql command for linear regression on the data'. Maybe this will mean something to others, but I would input the data in to R, then perform the linear regression in the R environment? If you provide data we can work with (check the link I posted) we can help you. – Phil Jun 04 '15 at 09:55
  • 2
    Isn't `lm()` a base R function? Why is `library(lm)` being called? – Olli J Jun 04 '15 at 09:59
  • QPhil : the database is connected with the Rserver as the dataset is huge(20mil+rows), so would not be able to run the linear regression in it's environment, hence am calling the rfunction from sql.did you get what i meant? hope i din't beat around the bush!. – Azhagu Jun 04 '15 at 10:01
  • @ olli : yes makes sense now, i've corrected the changes still the result isn't coming by, Thanks for the info :) – Azhagu Jun 04 '15 at 10:03
  • @thanks a lot for the offer phil, however i would not be able to provide the data due to the NDA norms :/. – Azhagu Jun 04 '15 at 10:10
  • 1
    @Phil - He needs to call it from SQL (SAP SQLScript actually), becuase he is using the Predictive Analysis Library engine built in the HANA in-memory database. TLDR: He is running these commands from inside the database. – Mike Gardner Jun 04 '15 at 14:49
  • If I assume part between `BEGIN` and `END;` should be a valid R syntax, `PREDICTIVE ANALYTICS` is not a valid R variable name. It would be key to know as what variable name the data comes from database into R. Above code suggests `train`, but is this definite? – Roman Luštrik Jun 05 '15 at 13:13
  • @Roman, for the R syntax to run linear regression model i wrote model_app <- lm( POWER_APP ~ CPUTI + DBTI + DBSU + KBYTES_TRANSFERRED, data = train ) isn't it right?. predictive analysis is name given to store the result of the lm(). train is definite. – Azhagu Jun 05 '15 at 13:46
  • I think we'll need more information at this point. Where does it stop? What do you consider under "unable to call the linear model"? Create the model or access the results? If latter, of which model, first, second, both? For the second model, coercing it to data.frame might not work. – Roman Luštrik Jun 05 '15 at 20:15
  • @Roman, it stops by after creating the table. both creating and accessing the result. since am new to R i dint understand the models you've said, could you be able to explain in breif?, would be really helpfull!. – Azhagu Jun 08 '15 at 06:26
  • Consider fixing R syntax errors first (see my previous comment). `as.data.frame` for `lm` object will probably not work. – Roman Luštrik Jun 08 '15 at 07:11

1 Answers1

0

Although I'm not familiar with SAP products, I will have a stab at the R code I assume is between BEGIN and END;.

library(lm)

is incorrect, as mentioned by @Olli. To access R's linear model capabilities, you have to call - nothing. It's loaded by default through stats package (this may not be true if R is called in --vanilla mode.

model_app <- lm( POWER_APP ~ CPUTI + DBTI + DBSU + KBYTES_TRANSFERRED, data = train )

appears to be OK, at least from a syntax's point of view.

For

colnames(datOut) <- c("POWER_APP", "CPUTI", "DBTI", "DBSU", "DBSU")

I can't see where you define datOut. If this variable is not created by the database, it does not exist and R should complain along the lines of

Error in colnames(notExist) <- "x" : object 'notExist' not found

I will assume you want to predict (means) based on a model. Line

PREDICTIVE ANALYSIS <- as.data.frame( lm(model_App))

will not work because R's variables should not have spaces, as.data.frame will not work on a lm object and model_App doesn't exist (notice the case). I think you should do something along the lines of

# based on http://help.sap.com/hana/sap_hana_r_integration_guide_en.pdf
# you have to specify variable result which will be exported to the database
result <- as.data.frame(predict(model_app))

You can try it out.

x <- 1:10
y <- rnorm(10)

mdl <- lm(y ~ x)

as.data.frame(predict(mdl))

   predict(mdl)
1    0.47866685
2    0.34418219
3    0.20969753
4    0.07521287
5   -0.05927180
6   -0.19375646
7   -0.32824112
8   -0.46272579
9   -0.59721045
10  -0.73169511
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • it's the biggest tip got from you for the R, Thanks a lot, when i made the changes and ran it, i recieved an error msg.ll post the changed code and error msg. – Azhagu Jun 08 '15 at 07:46
  • http://paste.fedoraproject.org/229841/75061914/ is the link for my changed code, the error generated which i dint get it is - Could not execute 'CALL USE_LM("POWER_CONSUMPTION", "PREDICTIVE ANALYSIS") WITH OVERVIEW' in 317 ms 456 µs . SAP DBTech JDBC: [2048]: column store error: search table error: [34084] Receive error: get result error.;Error: object 'result' not found stack trace: No traceback available – Azhagu Jun 08 '15 at 08:07
  • I've amended my answer. I now understand that object `result` is what is passed on to the database (from R). I found it here: http://help.sap.com/hana/sap_hana_r_integration_guide_en.pdf – Roman Luštrik Jun 08 '15 at 12:38
  • exactly.i'm using the hana_r_integration as well. Thanks roman :-) – Azhagu Jun 08 '15 at 12:49
  • i got one last error, the rest has been cleared. it says i cannot call USE_LM. it's an invalid name to call.am trying to find the answer for it. – Azhagu Jun 08 '15 at 13:28
  • i sorted out the calling function, the good news is i think R is able to identify the dataset from HANA database, but it's unable to reciprocate back. here is the error msg i got "Convert data error: Convert Data from R to NewDB failed.;Column 'POWER_APP' not computed from R " , is there any idea you can give me? – Azhagu Jun 08 '15 at 17:10
  • @Azhagu can you update your question with the most recent version of the code you're using? Have you tried the examples in the pdf I linked to in the answer? – Roman Luštrik Jun 08 '15 at 17:41
  • as you suggested i have updated it.Yes, with the pdf it worked. But not in this issue that i'va stated. – Azhagu Jun 09 '15 at 06:35