3

I have some legacy code for calculating correlations and now want to use a R solution as it is faster.

Having the following code:

DROP TABLE IF EXISTS #DummyData
CREATE TABLE #DummyData
(
     [VariableA] VARCHAR(24)
    ,[VariableB] VARCHAR(24)
    ,[Value] SMALLINT
)

INSERT INTO #DummyData([VariableA], [VariableB], [Value])
VALUES   ('A1','B1', 4)
        ,('A1','B2', 3)
        ,('A1','B3', 1)
        ,('A2','B1', 2)
        ,('A2','B2', 1)
        ,('A2','B3', 3)
        ,('A3','B1', 4)
        ,('A3','B2', 5)
        ,('A3','B3', 2);

EXECUTE sp_execute_external_script    
      @language = N'R'   
    , @script = N'
         library(reshape)
         pivotData <- cast(DataIn, VariableA ~ VariableB,fun.aggregate = max)
         curData <- cor(pivotData)
         DataOut <- data.frame(curData)
    '   
    , @input_data_1 = N'SELECT [VariableA], [VariableB], [Value] FROM #DummyData'
    , @input_data_1_name  = N'DataIn'   
    , @output_data_1_name =  N'DataOut';

we have this output:

enter image description here

Is there a way to turn in into this one using some R library function?

enter image description here

gotqn
  • 42,737
  • 46
  • 157
  • 243
  • I can't see how this can be a duplicate question? For reshape to work, you have two additional columns with containing the values. After correlation I have only the results (no such details) or even column headings. – gotqn Jul 25 '18 at 05:15
  • This is what i need - https://stackoverflow.com/questions/28035001/transform-correlation-matrix-into-dataframe-with-records-for-each-row-column-pai – gotqn Jul 25 '18 at 05:19

2 Answers2

2

You could run:

library(tidyverse)

DataOut %>% 
rownames_to_column("variable_b1") %>% 
gather(key = variable_b2, val = Rho, -variable_b1)

Results:

  variable_b1 variable_b2        Rho
1          B1          B1  1.0000000
2          B2          B1  0.8660254
3          B3          B1 -0.8660254
4          B1          B2  0.8660254
5          B2          B2  1.0000000
6          B3          B2 -0.5000000
7          B1          B3 -0.8660254
8          B2          B3 -0.5000000
9          B3          B3  1.0000000
Nicolás Velasquez
  • 5,623
  • 11
  • 22
2

If you get the results as a matrix, you can do this:

## Set the row and column names
dimnames(mat) <- list(rownames(mat, do.NULL = FALSE, prefix = "B"),
                      colnames(mat, do.NULL = FALSE, prefix = "B"))

## melt and set the variable names as desired
setNames(reshape2::melt(mat), c("VariableB_1", "VariableB_2", "Rho"))

#   VariableB_1 VariableB_2   Rho
# 1          B1          B1  1.00
# 2          B2          B1  0.86
# 3          B3          B1 -0.86
# 4          B1          B2  0.86
# 5          B2          B2  1.00
# 6          B3          B2 -0.50
# 7          B1          B3 -0.86
# 8          B2          B3 -0.50
# 9          B3          B3  1.00
M--
  • 25,431
  • 8
  • 61
  • 93