Reshape dataframe. Creating new rows for each value in multiple variables with NA's

Question

I want to reshape my data frame by adding a row for each value in three different variables.

Current data structure:

Land  Date          P1  P2    P3    

bb    1990-10-26    S   F     G
bb    1994-10-11    S   <NA> <NA>
be    1999-09-29    S   C    <NA>
be    2004-10-13    S   C    <NA>
be    2009-11-06    C   L    <NA>

Desired output:

 P  land  Date

 S  bb   1990-10-26
 F  bb   1990-10-26
 G  bb   1990-10-26
 S  bb   1994-10-11
 S  be   1999-09-29
 C  be   1999-09-29
 S  be   2004-10-13
 C  be   2004-10-13
 C  be   2009-11-06
 L  be   2009-11-06

Therefore, each different value, except the NA's, of the variables P1, P2, P3 should be translated to a new row. I hope you can help me with my problem.

Related/Possible duplicate https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format — Ronak Shah, May 06 '21 at 12:30

score 2 · Answer 1 · answered May 06 '21 at 09:44

a data.table approach. data.table::melt() has a na.rm-argument

library(data.table)
DT <- fread("Land  Date          P1  P2    P3    
bb    1990-10-26    S   F     G
bb    1994-10-11    S   NA NA
  be    1999-09-29    S   C    NA
  be    2004-10-13    S   C    NA
  be    2009-11-06    C   L    NA")

melt(DT, id.vars = c("Land", "Date"), na.rm = TRUE)
#    Land       Date variable value
# 1:   bb 1990-10-26       P1     S
# 2:   bb 1994-10-11       P1     S
# 3:   be 1999-09-29       P1     S
# 4:   be 2004-10-13       P1     S
# 5:   be 2009-11-06       P1     C
# 6:   bb 1990-10-26       P2     F
# 7:   be 1999-09-29       P2     C
# 8:   be 2004-10-13       P2     C
# 9:   be 2009-11-06       P2     L
#10:   bb 1990-10-26       P3     G

score 1 · Answer 2 · answered May 06 '21 at 09:57

Here is a base R option using reshape

`row.names<-`(na.omit(
  reshape(
    setNames(df, gsub("(\\d+)", ".P\\1", names(df))),
    direction = "long",
    idvar = c("Land", "Date"),
    timevar = "Col",
    varying = -(1:2)
  )
), NULL)

which gives

   Land       Date Col P
1    bb 1990-10-26  P1 S
2    bb 1994-10-11  P1 S
3    be 1999-09-29  P1 S
4    be 2004-10-13  P1 S
5    be 2009-11-06  P1 C
6    bb 1990-10-26  P2 F
7    be 1999-09-29  P2 C
8    be 2004-10-13  P2 C
9    be 2009-11-06  P2 L
10   bb 1990-10-26  P3 G

Data

> dput(df)
structure(list(Land = c("bb", "bb", "be", "be", "be"), Date = c("1990-10-26", 
"1994-10-11", "1999-09-29", "2004-10-13", "2009-11-06"), P1 = c("S",
"S", "S", "S", "C"), P2 = c("F", NA, "C", "C", "L"), P3 = c("G",
NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -5L
))

score 1 · Answer 3 · answered May 06 '21 at 10:02

1

tidyr way

df %>% pivot_longer(starts_with('P'), values_drop_na = T, names_to = NULL, values_to = 'P')

# A tibble: 10 x 3
   Land  Date       P    
   <chr> <chr>      <chr>
 1 bb    1990-10-26 S    
 2 bb    1990-10-26 F    
 3 bb    1990-10-26 G    
 4 bb    1994-10-11 S    
 5 be    1999-09-29 S    
 6 be    1999-09-29 C    
 7 be    2004-10-13 S    
 8 be    2004-10-13 C    
 9 be    2009-11-06 C    
10 be    2009-11-06 L

answered May 06 '21 at 10:02

AnilGoyal

25,297
4
27
45

1

I was thinking if anyone will give a `dplyr` solution and then yours comes up, cheers! Upvoted! – ThomasIsCoding May 06 '21 at 10:07
the arguments to `pivot_longer` and `pivot_wider` are really helpful and may eliminate the need of a few lines of code, if used. Thanks for vote. :) – AnilGoyal May 06 '21 at 10:13

Reshape dataframe. Creating new rows for each value in multiple variables with NA's

3 Answers3