0

My data comes in the following format:

id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3 

I would like to rearrange to the following format

id Year Profit
1  2017      2
1  2016      3
1  2015      6
1  2014      7
2  2017      4
2  2016      1
2  2015      8
2  2014      3

I do not know how to even start. Any recommendations on libraries or useful material that I can check?

Economist_Ayahuasca
  • 1,648
  • 24
  • 33

2 Answers2

2

This is a classical problem of conversion from wide to long format.
A nice tutorial is available here.
Below a solution based on the gather function of the tidyr package.

dts_wide <- read.table(header=T, text ='
id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3 
')

library(tidyr)
names(dts_wide)[-1] <- as.character(2017:2014)

dts_long <- gather(data=dts_wide, key=Year, value=Profit, 2:5, factor_key=TRUE)
dts_long$Year <- as.numeric(as.character(dts_long$Year))
dts_long

#   id Year Profit
# 1  1 2017      2
# 2  2 2017      4
# 3  1 2016      3
# 4  2 2016      1
# 5  1 2015      6
# 6  2 2015      8
# 7  1 2014      7
# 8  2 2014      3

EDIT
If there are two sets of columns (profit and revenue) a possible (not elegant) solution is:

dts_wide <- read.table(header=T, text ='
id profit2017 profit2016 profit2015 profit2014 revenue2017 revenue2016 revenue2015 revenue2014
1           2          3          6          7         21          31          61          71 
2           4          1          8          3         22          32          62          72
')

library(tidyr)
library(stringr)

dts_long <- dts_wide %>% gather(key=Year, value=Profit, 2:9, factor_key=TRUE) 
dts_long$key_tmp <- str_sub(dts_long$Year,1,-5)
dts_long$Year <- as.numeric(str_sub(dts_long$Year,-4,-1))
( dts_long <- dts_long %>% spread(key_tmp, Profit) )

#   id Year profit revenue
# 1  1 2014      7      71
# 2  1 2015      6      61
# 3  1 2016      3      31
# 4  1 2017      2      21
# 5  2 2014      3      72
# 6  2 2015      8      62
# 7  2 2016      1      32
# 8  2 2017      4      22
Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
2

You could also use melt from the reshape2 package:

df <- read.table(header = TRUE, 
text = "id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3" )

library(reshape2)

# drop the pattern "profit" from the column names
names(df) <- sub(pattern = "profit", replacement = "", names(df))
# go to long format with "id" as id.var, the rest are measure.vars
melt(df, id.vars = "id")
KoenV
  • 4,113
  • 2
  • 23
  • 38