how to split up a numeric column in r into 3 separate columns conditional on the value of another variable, with aim of reducing the final dimension

Question

I have a very basic R question but I am having a hard time trying to get the right answer. I am trying to translate SAS code to R.

Below is a snap shot of a subset of some data I am working with. What I wish to do is split up the column market_test into three separate columns based on the Timeperiod_num column.

This is easily done in SAS by using proc transpose and some of the other variables in the data:

proc transpose
data = LA_5_C out = LA_5_E prefix=market_test_;
by business_reg_number entity_name;
var market_test;
id Timeperiod;
quit;

which produces the desired output with 3 separate columns for the market test variable.

I have tried using the function dcast() in R to do this as I have used it to transpose data previously. However, in this case, when I try to execute the code:

A_5_E <- (dcast(as.data.table(LA_5_C), business_reg_number + entity_name ~ TimePeriod, value.var = "market_test"))

I get the warning "Aggregate function missing, defaulting to 'length'" and whilst the format produces three separate columns the output is not correct, see image below

I assume this is because the combination of variables I supplied does not uniquely identify one row in the original data set, which suggests the use of an aggregating function but I really don't understand which one I can supply to give me my desired output.

I have also tried to manually split up the market_test column into 3 based on year but this just returns a dataset with the same length as the original when I want it to be of reduced length(number of observations) since the 3 years of market tests will be displayed in the same row for each unique entity.

I'm really stuck and have been trying multiple different things but I'm not getting anywhere. Any help would be much appreciated. Thanks.

################# EDIT ################

So here is some sample data that is reproducible:

bus_reg_no <- c('G01', 'G01', 'G01', 'G02', 'G02', 'G02', 'G03', 'G03' , 'G03')
entity_name <- c('A','A','A','B','B','B','C','C','C')
TimePeriod <- c('2020', '2019', '2018', '2020', '2019', '2018', '2020', '2019', '2018')
market_test <- c(NA, 0.9330331232, 0.9969181046, 0.9429920482, 0.9689617356, 0.9764825438, NA, 0.2302289569, 0.7762837775)

dat <- cbind(bus_reg_no, entity_name, TimePeriod, market_test)

I want to split the column market_test up into 3 separate columns for each year in the data for each entity. So my desired output will have a reduced length, and for each row for each entitity there will be a separate column showing the market test figure for the years 2020, 2019, 2018 separately. So surely a code to execute what I want will use at least the variables entity_name, TimePeriod and market_test

Please include data using the output of `dput(your_data)` and include the code you have tried so as to make a [MRE] — Peter, Dec 05 '22 at 11:08
Anyway, it looks like a typo in your `dcast`: Try: `dcast(as.data.table(LA_5_C), business_reg_number + entity_name ~ Timeperiod_num, value.var = "market_test")` — harre, Dec 05 '22 at 11:36
I still get the warning "Aggregate function missing, defaulting to 'length'" — daisy, Dec 05 '22 at 12:36

how to split up a numeric column in r into 3 separate columns conditional on the value of another variable, with aim of reducing the final dimension

0 Answers0