I've tried many things trying to figure out to go from long to wide, but I cannot get one row per observation. It gives many NA values since my data is unbalanced (I cannot just shift all values one row up, etc.)
This is a part of my data:
structure(list(employees = c(384, 432, 624, 334, 356, 338, 348,
1122, 1110, 1492), profit_margin = c(-0.14684, -0.85298, -0.58792,
-0.38872, -1.30312, -0.86866, -0.6363, -1.925, 0.567, 3.984),
RD_expenses = c(8946.414554, 9977.75638, 43326.90616, 48870.14658,
35022.10866, 39584.25952, 32259.2173, 6303.95, 6812.46, 14993.39
), RD_intensity = c(7.10910850621956, 8.98811378416267, 15.6492601635234,
17.6773777378817, 13.1744528168514, 14.3544852219875, 11.2624231565094,
0.500071500320608, 0.559723756230354, 1.36999818636439),
sales = c(125844.3945, 111010.5704, 276862.329, 276455.8596,
265833.4972, 275762.3064, 286432.2966, 1260609.732, 1217111.106,
1094409.478), treated = c("1", "1", "1", "1", "1", "1", "1",
"1", "1", "1"), year = c(2013L, 2014L, 2015L, 2016L, 2017L,
2018L, 2019L, 2015L, 2016L, 2017L), id = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L), company = c("ALLERGAN PUBLIC LIMITED COMPANY",
"ALLERGAN PUBLIC LIMITED COMPANY", "ALLERGAN PUBLIC LIMITED COMPANY",
"ALLERGAN PUBLIC LIMITED COMPANY", "ALLERGAN PUBLIC LIMITED COMPANY",
"ALLERGAN PUBLIC LIMITED COMPANY", "ALLERGAN PUBLIC LIMITED COMPANY",
"ALPINE ELECTRONICS, INC.", "ALPINE ELECTRONICS, INC.", "ALPINE ELECTRONICS, INC."
)), row.names = c(NA, -10L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x000001c71d471ef0>)
I've tried this:
test %>% group_by(id, company) %>% dplyr::mutate(row = row_number()) %>% tidyr::pivot_wider(names_from = year, values_from = c("employees", "profit_margin", "RD_expenses", "RD_intensity", "sales", "treated"))
But this gives many NA values and not one row per observation, like this:
1 ALLERGAN PUBLIC LIMITED COMPANY 1 384 NA NA NA NA
1 ALLERGAN PUBLIC LIMITED COMPANY 2 NA 432 NA NA NA
1 ALLERGAN PUBLIC LIMITED COMPANY 3 NA NA 624 NA NA
1 ALLERGAN PUBLIC LIMITED COMPANY 4 NA NA NA 334 NA
1 ALLERGAN PUBLIC LIMITED COMPANY 5 NA NA NA NA 356
1 ALLERGAN PUBLIC LIMITED COMPANY 6 NA NA NA NA NA
1 ALLERGAN PUBLIC LIMITED COMPANY 7 NA NA NA NA NA
2 ALPINE ELECTRONICS, INC. 1 NA NA 1122 NA NA
2 ALPINE ELECTRONICS, INC. 2 NA NA NA 1110 NA
2 ALPINE ELECTRONICS, INC. 3 NA NA NA NA 1492
Also, I do not have exactly 7 observations per company, so that makes it a bit harder.
I have also tried this:
test %>%
group_by(id) %>%
dplyr::mutate(Visit = 1:n()) %>%
gather("employees", "profit_margin", "RD_expenses", "RD_intensity", "sales", "treated", "year", key = variable, value = number) %>%
unite(combi, variable, Visit) %>%
spread(combi, number)
But that gives even more strange results, with columns up till _31, where the maximum of observations of 1 company (or id) is 7.
Any ideas? I need it in order to use matching!
Thank you