Currently I'm using the reshape
library to pivot data in R, but it seems to struggle when i many columns(4000+). Is there any Multithreaded alternative to this function (similar to RevoScaleR
package by MS) or any better way to do this?
Here is an example of the Code that i have right now:
DROP TABLE IF EXISTS #DummyData
CREATE TABLE #DummyData
(
[VariableA] VARCHAR(24)
,[VariableB] VARCHAR(24)
,[Value] SMALLINT
)
INSERT INTO #DummyData([VariableA], [VariableB], [Value])
VALUES ('A1','B1', 4)
,('A1','B2', 3)
,('A1','B3', 1)
,('A2','B1', 2)
,('A2','B2', 1)
,('A2','B3', 3)
,('A3','B1', 4)
,('A3','B2', 5)
,('A3','B3', 2);
EXECUTE sp_execute_external_script
@language = N'R'
, @script = N'
library(reshape)
pivotData <- cast(DataIn, VariableA ~ VariableB ,fun.aggregate = max)
DataOut <- pivotData
'
, @input_data_1 = N'SELECT [VariableA], [VariableB], [Value] FROM #DummyData'
, @input_data_1_name = N'DataIn'
, @output_data_1_name = N'DataOut';
This query returs the following result: