1

I've tried many things trying to figure out to go from long to wide, but I cannot get one row per observation. It gives many NA values since my data is unbalanced (I cannot just shift all values one row up, etc.)

This is a part of my data:

structure(list(employees = c(384, 432, 624, 334, 356, 338, 348, 
1122, 1110, 1492), profit_margin = c(-0.14684, -0.85298, -0.58792, 
-0.38872, -1.30312, -0.86866, -0.6363, -1.925, 0.567, 3.984), 
    RD_expenses = c(8946.414554, 9977.75638, 43326.90616, 48870.14658, 
    35022.10866, 39584.25952, 32259.2173, 6303.95, 6812.46, 14993.39
    ), RD_intensity = c(7.10910850621956, 8.98811378416267, 15.6492601635234, 
    17.6773777378817, 13.1744528168514, 14.3544852219875, 11.2624231565094, 
    0.500071500320608, 0.559723756230354, 1.36999818636439), 
    sales = c(125844.3945, 111010.5704, 276862.329, 276455.8596, 
    265833.4972, 275762.3064, 286432.2966, 1260609.732, 1217111.106, 
    1094409.478), treated = c("1", "1", "1", "1", "1", "1", "1", 
    "1", "1", "1"), year = c(2013L, 2014L, 2015L, 2016L, 2017L, 
    2018L, 2019L, 2015L, 2016L, 2017L), id = c(1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L), company = c("ALLERGAN PUBLIC LIMITED COMPANY", 
    "ALLERGAN PUBLIC LIMITED COMPANY", "ALLERGAN PUBLIC LIMITED COMPANY", 
    "ALLERGAN PUBLIC LIMITED COMPANY", "ALLERGAN PUBLIC LIMITED COMPANY", 
    "ALLERGAN PUBLIC LIMITED COMPANY", "ALLERGAN PUBLIC LIMITED COMPANY", 
    "ALPINE ELECTRONICS, INC.", "ALPINE ELECTRONICS, INC.", "ALPINE ELECTRONICS, INC."
    )), row.names = c(NA, -10L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x000001c71d471ef0>)

I've tried this:

test %>%   group_by(id, company) %>%   dplyr::mutate(row = row_number()) %>%   tidyr::pivot_wider(names_from = year, values_from = c("employees", "profit_margin", "RD_expenses", "RD_intensity", "sales", "treated")) 

But this gives many NA values and not one row per observation, like this:

1   ALLERGAN PUBLIC LIMITED COMPANY 1   384 NA  NA  NA  NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 2   NA  432 NA  NA  NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 3   NA  NA  624 NA  NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 4   NA  NA  NA  334 NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 5   NA  NA  NA  NA  356 
1   ALLERGAN PUBLIC LIMITED COMPANY 6   NA  NA  NA  NA  NA  
1   ALLERGAN PUBLIC LIMITED COMPANY 7   NA  NA  NA  NA  NA  
2   ALPINE ELECTRONICS, INC.    1   NA  NA  1122    NA  NA  
2   ALPINE ELECTRONICS, INC.    2   NA  NA  NA  1110    NA  
2   ALPINE ELECTRONICS, INC.    3   NA  NA  NA  NA  1492

Also, I do not have exactly 7 observations per company, so that makes it a bit harder.

I have also tried this:

test %>% 
  group_by(id) %>% 
  dplyr::mutate(Visit = 1:n()) %>%
gather("employees", "profit_margin", "RD_expenses", "RD_intensity", "sales", "treated", "year", key = variable, value = number) %>%
unite(combi, variable, Visit) %>%
 spread(combi, number)

But that gives even more strange results, with columns up till _31, where the maximum of observations of 1 company (or id) is 7.

Any ideas? I need it in order to use matching!

Thank you

2 Answers2

1

You can just use the reshape() function in base R.

reshape(d, direction = "wide", timevar = "year", idvar = c("id", "company"))

There will be NAs for any years that the firm doesn't have data. Include any time-fixed variables (e.g., country or strategy, if measured) in idvar.

Noah
  • 3,437
  • 1
  • 11
  • 27
0

I think you can skip creation of row column all together.

tidyr::pivot_wider(df, names_from = year, 
                  values_from = c(employees, profit_margin, RD_expenses, RD_intensity, sales, treated)) 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213