3

Say we have two variables Y and X and we want to make a simple regression model with Transform > Run R script in Power Query. Here we have sample data:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSAeJYnWglYyDLCMwyBbKMwSwLIMtEKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Y = _t, X = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Y", Int64.Type}, {"X", Int64.Type}})
in
    #"Changed Type"

Then I add the Run R script step to Power Query with the following code:

output <- lm(Y ~ X, data=dataset)

enter image description here

But I get an empty table:

enter image description here

How to modify R script to get the model summary?

vestland
  • 55,229
  • 37
  • 187
  • 305
Przemyslaw Remin
  • 6,276
  • 25
  • 113
  • 191

1 Answers1

4

To my knowledge, in order to retrieve any output from your R script, it has to be in the form of a data.frame. However, if you try to run something like df<-data.frame(output), you'll get the error

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = > > stringsAsFactors) : cannot coerce class ‘"summary.lm"’ to a data.frame

But you can retrieve parts of that very same summary and organize it in a dataframe. Building on your snippet, here's how you can do it for the model coefficients. I've taken the libery to rename your output to model.

Code:

model <- lm(Y ~ X, dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)

Output 1:

enter image description here

Next, just click Table to get:

Output 2:

enter image description here

Since I don't have your data, I've just used Enter Data and this simple data sample:

enter image description here

Edit: Other model estimates

If you'd like to retrieve estimated series from the model such as residuals or fitted values, just add the following lines to the snippet above:

df_estimates <- data.frame(fitted(model), residuals(model))
colnames(df_estimates) <- c('fitted', 'residuals')

The preliminary output will now be:

enter image description here

And this is a pretty hande feature in Power BI. Just click on table next to the dataframe you'd like to keep working with. In the latter case, you'll get this:

enter image description here

Edit 2 - Include original dataset in your output:

#Y <- c(1,2,3,4,4)
#X <- c(1,2,3,4,5)

#dataset <- data.frame(X, Y)

model <- lm(Y ~ X, dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)
df_estimates <- data.frame(dataset$X, dataset$Y, fitted(model), residuals(model))
colnames(df_estimates) <- c('X', 'Y', 'fitted', 'residuals')
df_estimates
vestland
  • 55,229
  • 37
  • 187
  • 305
  • Yes, works. I was just checking for other components of summary:-) – Przemyslaw Remin Feb 05 '20 at 09:47
  • @Przemyslaw Remin Which components? Would you like some assistance on that? – vestland Feb 05 '20 at 09:48
  • Fitted values and residuals. – Przemyslaw Remin Feb 05 '20 at 09:51
  • @Przemyslaw Remin I'll take a look. – vestland Feb 05 '20 at 09:52
  • @Przemyslaw Remin I added a little approach for fitted values and residauls in the answer. – vestland Feb 05 '20 at 10:06
  • @Przemyslaw Remin How did that last edit work out for you? – vestland Feb 05 '20 at 10:54
  • Well, sure it did work. I cannot vote it up again:-) Thank you. – Przemyslaw Remin Feb 05 '20 at 10:59
  • @Przemyslaw Remin The power-bi tag needs more action on the application of R and Python, so I'm happy to help! If you're interested in more, I'd suggest a closer look at these: [Multiple Linear Regression in Power BI](https://stackoverflow.com/questions/48796873/multiple-linear-regression-in-power-bi/48990338#48990338) and [Operations on multiple tables / datasets with Edit Queries and R in Power BI](https://stackoverflow.com/questions/44897796/operations-on-multiple-tables-datasets-with-edit-queries-and-r-in-power-bi/45961614#45961614) – vestland Feb 05 '20 at 11:08
  • 1
    Agree! I am grateful for you help. You can count on my questions. – Przemyslaw Remin Feb 05 '20 at 11:34
  • How to return `df_estimates <- data.frame(fitted(model), residuals(model))` together with original dataset. So that we have in this case Y, X, fitted, residuals. I would also like to have a summary stats with min q1 median q3 max which summary function returns. – Przemyslaw Remin Feb 05 '20 at 18:42
  • @PrzemyslawRemin I'll take a look tomorrow. Feel free to remind med. – vestland Feb 05 '20 at 18:55
  • 1
    Hi, I would like to kindly remind you about my seek for including original dataset to fitted values and residuals, and about retrieving residual summary stats. – Przemyslaw Remin Feb 06 '20 at 11:27
  • @PrzemyslawRemin I included the original dataset in `df_estimates`. We'll have to take a closer look at retrieving residual summary stats at a later time. – vestland Feb 06 '20 at 14:15
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/207354/discussion-between-przemyslaw-remin-and-vestland). – Przemyslaw Remin Feb 06 '20 at 14:16