How to change the format of a dataframe

Question

I have a dataframe of the following format:

> example_df
  P_ID Income2000 Income2001 Income2002
1    1         21         22         23
2    2         15         15         15
3    3         10         13         15

where I have multiple persons with Income over three years given by the colums Income2000, Income2001 and Income2002.

Now I was wondering whether there is a fast and effective way of transforming the above dataframe into a dataframe that looks like:

> example_df2
  Year P_ID Income
1 2000    1     21
2 2001    1     22
3 2002    1     23
4 2000    2     15
5 2001    2     15
6 2002    2     15
7 2000    3     10
8 2001    3     13
9 2002    3     15

@akrun, I did not downvoted your answer (I upvoted it to compensate) and as you, I don't understand why. I agree the duplicated link does not fully recapitulate the desired output of the OP but it is provided the main part of the solution. — dc37, Mar 24 '20 at 23:17

score 0 · Answer 1 · answered Mar 24 '20 at 23:04

We can use pivot_longer to reshape from 'wide' to 'long' and then remove the substring form the column names

library(dplyr)
library(tidyr)
example_df %>% 
    pivot_longer(cols = -P_ID, names_to = 'Year',values_to = 'Income') %>%
    mutate(Year = readr::parse_number(Year))
# A tibble: 9 x 3
#   P_ID  Year Income
#  <int> <dbl>  <int>
#1     1  2000     21
#2     1  2001     22
#3     1  2002     23
#4     2  2000     15
#5     2  2001     15
#6     2  2002     15
#7     3  2000     10
#8     3  2001     13
#9     3  2002     15

data

example_df <- structure(list(P_ID = 1:3, Income2000 = c(21L, 15L, 10L), Income2001 = c(22L, 
15L, 13L), Income2002 = c(23L, 15L, 15L)), class = "data.frame", row.names = c("1", 
"2", "3"))

Hello akrun, I dont know why this post got downvoted. I works perfectly well! Thank you very much :) — Rkid, Mar 24 '20 at 23:16

Ian Campbell · Answer 2 · 2020-03-24T23:27:51.040

Here is a base R approach.

reshape(example_df, 
        direction = "long",
        idvar = "P_ID",
        varying=list(names(example_df)[2:4]),
        timevar="Year",
        times=c(2000,2001,2002),
        v.names = "Income")

       P_ID Year Income2000
1.2000    1 2000         21
2.2000    2 2000         15
3.2000    3 2000         10
1.2001    1 2001         22
2.2001    2 2001         15
3.2001    3 2001         13
1.2002    1 2002         23
2.2002    2 2002         15
3.2002    3 2002         15

How to change the format of a dataframe

2 Answers2

data