8

I wanted to execute R code from SSIS package. How can I add a data control step that executes R-code? SSIS supports only vb.net and asp.net.

SSIS has many data transformations available but R is very friendly when it comes to data manipulations.

I want to run a R-code from SSIS scripts or some other way.Basically, I'm trying to integrate R in ETL process.

I wanted to extract data(E) from from a CSV file.

Transform (T) it in R and load (L) it in Microsoft database. Is it possible to get this workflow done in SSIS package by executing R-script using SSIS data control items? Thanks!

Ruchi
  • 1,238
  • 11
  • 32
Amit Ugle
  • 139
  • 1
  • 1
  • 10
  • 1
    Welcome to stackoverflow (SO)! It's more likely that we will be able to help you if you make a reproducible example to go along with your question. Something we can work from and use to show you how it might be possible to solve your problem. You can have a look at [this SO post on how to make a great reproducible example in R](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Furthermore, it would also be helpful if you outline what you have already tried. – Eric Fail Oct 26 '15 at 18:54
  • As of now, SSIS will accept only VB script and C# language. If you want to load csv to database then you can use data flow task directly and the connection managers should be dynamic not static – koushik veldanda Oct 27 '15 at 09:42

2 Answers2

9

Here are a couple of ways you could integrate R into your ETL process.

  1. Crude, fast and dirty - Execute Process Task in the Control Flow. This would be similar to calling RScript from the command line. You would likely make your transformation, save it to a file on disk, and get that filename from your Execute Process Task so you can feed it into a Data Flow task. Upside is you're keeping your R clean and separate from your C#/VB.

  2. Integrated via Rdotnet - You could use the RDotNet library (I believe, haven't tried to integrate it). You would need to register the DLLs in the GAC, and then you can either work with .NET objects in your SSIS scripts or call R scripts directly.

  3. Integrated in SQL Server 2016 - Microsoft has added R support via extended stored procedures. You call the R script via stored proc and use a sql query for input data and can store the output. See more detail here. This would mean utilizing an Execute SQL task in SSIS.

sorrell
  • 1,801
  • 1
  • 16
  • 27
  • I have followed Crude, fast and dirty method. Now I have an SSIS package running R script and loading output to database table in visual studio. I need to schedule and deploy it on SQL Server. My main concern would be where to place R Script. For the package that runs on my computer, pulls R Script from my local files. But I am wondering where to store the file in order to run the package on SQL Server apart from my system. Any ideas please. – krpa Jun 12 '18 at 18:22
  • One option, store script text in a db table, add a step to query that and store result to file. https://www.google.com/amp/s/blog.sqlauthority.com/2013/09/12/sql-server-automatically-store-results-of-query-to-file-with-sqlcmd/amp/ – sorrell Jun 13 '18 at 09:13
  • As I have mentioned about schedule and deploy on SQL server, that means I am scheduling the package to run daily automatically without my intervention. The option you suggested requires entering the command manually to run the file. I have researched about `sqlrutils` package which requires my R code to be re-written to fit in stored procedure. But the package comes default with Machine Learning services for SQL server. Since I am working on remote system and sqlrrutils is specific to the system installed, I am unable to use sqlrutils in R studio on my system. So looking for other options. – krpa Jun 13 '18 at 12:22
  • Did you ever sort this out @krpa? How did you do it? – Dov Rosenberg Aug 26 '19 at 15:16
  • Yes I did. I have created a stored procedure to run external script and added my R code to it on sql server and also created ssis package to invoke the execution of stored procedure using ‘Execute sql task’. To do all of this, sql server needs to have R server and the supporting packages installed on it. For automation, deployed the ssis package on sql server and scheduled a job to run at specific time. – krpa Aug 27 '19 at 17:10
0

I hope it helps you or someone else, since you want data processing you might bring your dataset into a CSV file (throught a data flow task), execute the file using: "Rscript " (it might be executed as a command with the execute process task), inside the file you have to upload the dataset into a dataframe ( calling it with readLines() function), then do all the math/Calculation you request, write the data or calculation results into a CSV file an reading again it from SSIS.

It is not an elegant solution, but it works :), At least till microsoft integrates R as a control/data flow process.

CYA

PS. here you go how to execute files from the command line: Run R script from command line

Community
  • 1
  • 1
Jim R.
  • 1
  • 1