0

What is the best way to deploy a machine learning script written in R (ie. linear regression, random forest, k-means, etc.) to PowerBI?

Here is what I am currently thinking: My response column is column_x. My training set will be all rows in my dataset where I have a value for column_x. My test set will be all rows where I do not have a value for column_x (hence needing a predicted value from the ML script). I will put the output of the ML script in a new column in PowerBI (using edit queries option) for predicted response called pred_column_x. Not really sure how to go about this.

Modification/filling in of the sample script below or an alternative solution to how this could be mapped out would be super helpful!

Thank you!

data.df <- dataset in PowerBI

train.df <- dataset where column_x is not null??
test.df <- dataset where column_x is null??

#linear regression fit
fit <- lm(pred_column_x ~ column_a + column_b, data=train.df)
#need this fit/predicted value to go in pred_column_x
summary(fit)

piper180
  • 329
  • 2
  • 12
  • 2
    Are you using SQL Server? It might be easier to run the R in SQL Server and pull the results into Power BI with your query: https://learn.microsoft.com/en-us/sql/advanced-analytics/tutorials/quickstart-r-create-script?view=sql-server-ver15. – cgrafe Jan 02 '20 at 16:04
  • 1
    @ava You'll find everything you need to get going here: [Multiple Linear Regression in Power BI](https://stackoverflow.com/questions/48796873/multiple-linear-regression-in-power-bi/48990338#48990338) – vestland Feb 04 '20 at 19:07
  • 1
    Does this answer your question? [Multiple Linear Regression in Power BI](https://stackoverflow.com/questions/48796873/multiple-linear-regression-in-power-bi) – vestland Feb 04 '20 at 19:08

1 Answers1

1

Recently, I was working on a contractual job, wherein the source data could only be accessed through PowerBI (PBI). The objective was to build a predictive model. I'm a novice in PBI. Since, I use R for all my analysis, I wrote a script and applied it to the data in PBI. Here are some points you must know;

  1. PBI always refers the data contained within it as, dataset. So, change all occurrences of dataset in your R script to dataset.
  2. In PBI interface, click on edit queries, then choose the table that you want to work with as shown enter image description here
  3. Then click on Transform tab and click on Run R script. A message on Insert step will pop up, see screenshot. enter image description here
  4. Click on Insert button. This will open the Run R script editor window. Notice, the message, # 'dataset' holds the input data for this script (this is what i referred to in point 1).
  5. Add your R code in this box and click OK button. The code will execute and give result.

enter image description here

Points to note

  1. If the dataset size is huge, its advisable to take a sample of it. Then execute the code on this sample on your local R environment. This will ensure the script is working without errors.
  2. The predictive analytics result need to be saved in a dataframe format otherwise PBI will not show the results. So this part of your R code, #linear regression fit fit <- lm(pred_column_x ~ column_a + column_b, data=train.df), needs to be saved as a dataframe, like, #linear regression fit fit <- data.frame(lm(pred_column_x ~ column_a + column_b, data=train.df)). Apparently the dataframe of R is interpreted as a table in PBI.
  3. The analysis result will look something like this,enter image description here You'll have to click on the Table to view the result.

Disclaimer

If anyone got a better approach then please suggest in comments.

mnm
  • 1,962
  • 4
  • 19
  • 46
  • This is super helpful! Thank you. Another question, what if I would like the values to be in a new column in my existing dataset, instead of in its own dataframe? I'd basically like to predict a column value for each row. I wasn't able to use the dataset$value operator due to the list having fewer values than the data ("replacement has 500 rows, data has 800) Thanks! – piper180 Feb 19 '20 at 18:21
  • @ava ask yourself the same question that you've asked me! If you can add the predictions as a new column in the existing dataset, in the local R environment, then it should work in PBI too... – mnm Feb 20 '20 at 13:35