I have data on taxpayers which have multiple branches which produces multiple rows which are almost duplicates except for one relevant column (enterprise activity). I want to change this so that each taxpayer has only one row, which involves creating columns 'enterprise_activity_1', 'enterprise_activity_2' and so on.
I realise this is similar to reshaping but I can't think of a way to use tidyr::spread to achieve this.
For simplicity, say we just have a dataframe like:
df <- tibble::tibble(
TAXPAYER_ID = c(100, 151, 250, 250, 267, 296, 296, 304),
ENTERPRISE_ACTIVITY = rep(c("AGRICULTURE", "MANUFACTURING"), 4)
)
What I would like to achieve is this:
TAXPAYER_ID ENTERPRISE_ACTIVITY_1 ENTERPRISE_ACTIVITY_2
100 AGRICULTURE NA
151 MANUFACTURING NA
250 AGRICULTURE MANUFACTURING
267 AGRICULTURE NA
296 MANUFACTURING AGRICULTURE
304 MANUFACTURING NA
My actual data has varying numbers of branches per taxpayer so the number of columns should be the maximum number of branches that one taxpayer has.