4

I am new here, hope to help and be helped.

However, I am working on the new Microsoft Sql Server Management Studio (2016), using its new features that imply the integration with R. First of all, my goal is to create a stored procedure that perform a K-Means clustering with x and y column.

The problem is that I am stuck in the middle, because I am not able to decline the online documentation to my case.

Here the script

    CREATE TABLE [dbo].[ModelTable]
    (
    column_name1 varchar(8000)
    )
    ;

    CREATE TABLE [dbo].[ResultTable]
    (
    column_name1 varchar(8000),
    column_name2 varchar(8000),
    column_name3 varchar(8000),
    column_name4 varchar(8000)
    )
    ;

    CREATE PROCEDURE [dbo].[kmean]  

    AS  
    BEGIN  
    DECLARE @inquery nvarchar(max) = N'  
              select  name,x,y FROM [dbtable] 

   '  
   -- then I decide to insert the model in a table: this is similar to  the documentation, but I am not sure it fits well.  

   INSERT INTO [dbo].[ModelTable] 
   EXEC sp_execute_external_script @language = N'R',  
                                   @script = N'  

   ## Here I create model: this is one of the biggest problem, because I tried to create a data frame with the data, but I do not know if here,
   ## in the R code, the data are read in this way. Generally in "pure"  R, I write data.frame(sourcedata$x,sourcedata$y), but here, where   is source of data?
   ## In the documentation it is used ImputDataSet, so maybe I could do: 

    trained_model <- kmeans(data.frame(ImputDataSet$x,ImputDataSet$y),8)  

    -- If everything is ok (doubtfully) I should have the model. And here, the part that I really cannot handle.
    -- I'd like to have a table [ResultTable] with name, variable x, variable y,  and trainedmodel$cluster.

    ',  
                              @input_data_1 = @inquery,  
                              @output_data_1_name = N'trained_model'  
     ;  


    END  
    GO  

    EXEC kmean 

Well there many problems and so on and, due the fact that this is a quite brand new feature in MSSMS, there are not a great quantity of help etc. in the Internet. Thanks in advance

epo3
  • 2,991
  • 2
  • 33
  • 60
s__
  • 9,270
  • 3
  • 27
  • 45
  • Maybe go the other way, [read the data](http://stackoverflow.com/questions/3932864/reading-data-from-microsoft-sql-server-into-r) into R, then use R for [kmeans](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html) ? – zx8754 Aug 30 '16 at 14:53
  • 1
    @zx8754 - I believe OP is running R as seen with the syntax but sourcing it through a SQL Server stored procedure, possibly using the [SQL Server R Services](https://msdn.microsoft.com/en-us/library/mt604885.aspx). – Parfait Aug 30 '16 at 17:01

1 Answers1

4

We can try the following:

CREATE TABLE #tempData (x float not null, y float not null); 
INSERT INTO #tempData   VALUES (0, 0), (0.1, 0.1), (1, 1), (1.1, 1.1);

CREATE TABLE #output (x float, y float, Cluster int); 

INSERT INTO #output
EXECUTE  sp_execute_external_script
                @language = N'R'
              , @script = N'                
                        trained_model <- kmeans(df[, c("x", "y")], 2)
                        df$cluster <- trained_model$cluster
                        '
              , @input_data_1 = N'SELECT * from #tempData'
              , @output_data_1_name = N'df'
              , @input_data_1_name = N'df';

SELECT *
FROM #output

with output:

x   y   Cluster
0   0   1
0.1 0.1 1
1   1   2
1.1 1.1 2

Note that I specified my input- and output data to be df. The defaults are InputDataSet and OutputDataSet.

If you have longer R-scripts: I would recommend writing and testing them in your R enviroment, then keep them in a package and simply load and call these instead.

J.R.
  • 3,838
  • 1
  • 21
  • 25
  • Thanks a lot, till now it works: I have only to decline it to a query. Yes, I have tried in R, good hit on keeping on packages! – s__ Aug 31 '16 at 08:18
  • You are welcome! Wrapping large scripts in R-packages has been my take on a best practice for R as stored procedure. It makes writing, debugging and maintenance easier. I wonder why this is not explicitly illustrated in an example. The `sp_execute...` command is horrible. – J.R. Aug 31 '16 at 17:38