Rearranging cross sectional time panel data in R

Question

My data comes in the following format:

id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3

I would like to rearrange to the following format

id Year Profit
1  2017      2
1  2016      3
1  2015      6
1  2014      7
2  2017      4
2  2016      1
2  2015      8
2  2014      3

I do not know how to even start. Any recommendations on libraries or useful material that I can check?

Marco Sandri · Accepted Answer · 2017-07-01T12:03:32.970

This is a classical problem of conversion from wide to long format.
A nice tutorial is available here.
Below a solution based on the gather function of the tidyr package.

dts_wide <- read.table(header=T, text ='
id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3 
')

library(tidyr)
names(dts_wide)[-1] <- as.character(2017:2014)

dts_long <- gather(data=dts_wide, key=Year, value=Profit, 2:5, factor_key=TRUE)
dts_long$Year <- as.numeric(as.character(dts_long$Year))
dts_long

#   id Year Profit
# 1  1 2017      2
# 2  2 2017      4
# 3  1 2016      3
# 4  2 2016      1
# 5  1 2015      6
# 6  2 2015      8
# 7  1 2014      7
# 8  2 2014      3

EDIT
If there are two sets of columns (profit and revenue) a possible (not elegant) solution is:

dts_wide <- read.table(header=T, text ='
id profit2017 profit2016 profit2015 profit2014 revenue2017 revenue2016 revenue2015 revenue2014
1           2          3          6          7         21          31          61          71 
2           4          1          8          3         22          32          62          72
')

library(tidyr)
library(stringr)

dts_long <- dts_wide %>% gather(key=Year, value=Profit, 2:9, factor_key=TRUE) 
dts_long$key_tmp <- str_sub(dts_long$Year,1,-5)
dts_long$Year <- as.numeric(str_sub(dts_long$Year,-4,-1))
( dts_long <- dts_long %>% spread(key_tmp, Profit) )

#   id Year profit revenue
# 1  1 2014      7      71
# 2  1 2015      6      61
# 3  1 2016      3      31
# 4  1 2017      2      21
# 5  2 2014      3      72
# 6  2 2015      8      62
# 7  2 2016      1      32
# 8  2 2017      4      22

Thanks very much for the answer Marco, really well formulated and of great help! I am just wondering it instead of only one variable "profit" we would have an additional one "revenue". How would code dts_long <- gather change? — Economist_Ayahuasca, Jul 01 '17 at 10:52
@AndresAzqueta I added to my answer a possible solution to your more general problem. It is not very elegant, but it works. — Marco Sandri, Jul 01 '17 at 12:05

KoenV · Answer 2 · 2017-07-01T10:38:00.253

2

You could also use melt from the reshape2 package:

df <- read.table(header = TRUE, 
text = "id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3" )

library(reshape2)

# drop the pattern "profit" from the column names
names(df) <- sub(pattern = "profit", replacement = "", names(df))
# go to long format with "id" as id.var, the rest are measure.vars
melt(df, id.vars = "id")

edited Jul 01 '17 at 10:38

answered Jul 01 '17 at 10:26

KoenV

4,113
2
23
38

really well formulated thanks! – Economist_Ayahuasca Jul 01 '17 at 10:52

Rearranging cross sectional time panel data in R

2 Answers2