0

I have a very basic R question but I am having a hard time trying to get the right answer. I am trying to translate SAS code to R.

Below is a snap shot of a subset of some data I am working with. What I wish to do is split up the column market_test into three separate columns based on the Timeperiod_num column.

enter image description here

This is easily done in SAS by using proc transpose and some of the other variables in the data:

proc transpose
data = LA_5_C out = LA_5_E prefix=market_test_;
by business_reg_number entity_name;
var market_test;
id Timeperiod;
quit; 

which produces the desired output with 3 separate columns for the market test variable. enter image description here

I have tried using the function dcast() in R to do this as I have used it to transpose data previously. However, in this case, when I try to execute the code:

A_5_E <- (dcast(as.data.table(LA_5_C), business_reg_number + entity_name ~ TimePeriod, value.var = "market_test"))

I get the warning "Aggregate function missing, defaulting to 'length'" and whilst the format produces three separate columns the output is not correct, see image below

enter image description here

I assume this is because the combination of variables I supplied does not uniquely identify one row in the original data set, which suggests the use of an aggregating function but I really don't understand which one I can supply to give me my desired output.

I have also tried to manually split up the market_test column into 3 based on year but this just returns a dataset with the same length as the original when I want it to be of reduced length(number of observations) since the 3 years of market tests will be displayed in the same row for each unique entity.

enter image description here

I'm really stuck and have been trying multiple different things but I'm not getting anywhere. Any help would be much appreciated. Thanks.

################# EDIT ################

So here is some sample data that is reproducible:

bus_reg_no <- c('G01', 'G01', 'G01', 'G02', 'G02', 'G02', 'G03', 'G03' , 'G03')
entity_name <- c('A','A','A','B','B','B','C','C','C')
TimePeriod <- c('2020', '2019', '2018', '2020', '2019', '2018', '2020', '2019', '2018')
market_test <- c(NA, 0.9330331232, 0.9969181046, 0.9429920482, 0.9689617356, 0.9764825438, NA, 0.2302289569, 0.7762837775)

dat <- cbind(bus_reg_no, entity_name, TimePeriod, market_test)

I want to split the column market_test up into 3 separate columns for each year in the data for each entity. So my desired output will have a reduced length, and for each row for each entitity there will be a separate column showing the market test figure for the years 2020, 2019, 2018 separately. So surely a code to execute what I want will use at least the variables entity_name, TimePeriod and market_test

daisy
  • 61
  • 1
  • 6
  • 2
    Please include data using the output of `dput(your_data)` and include the code you have tried so as to make a [MRE] – Peter Dec 05 '22 at 11:08
  • Anyway, it looks like a typo in your `dcast`: Try: `dcast(as.data.table(LA_5_C), business_reg_number + entity_name ~ Timeperiod_num, value.var = "market_test")` – harre Dec 05 '22 at 11:36
  • I still get the warning "Aggregate function missing, defaulting to 'length'" – daisy Dec 05 '22 at 12:36

0 Answers0