2

Simple pivot_longer question. Given:

library(tidyverse)
df <- data.frame(id = c(1),
                 test_date = as.Date("01/01/1998", "%d/%m/%Y"),
                 test_result = c("yes"),
                 exit_date = as.Date("01/02/1998", "%d/%m/%Y"),
                 dob_date =  as.Date("01/01/1997", "%d/%m/%Y"),
                 exit_result_1 = c(10),
                 exit_result_2 = c(4),
                 exit_result_3 = c(2))
df
#   id  test_date test_result  exit_date   dob_date exit_result_1 exit_result_2 exit_result_3
# 1  1 1998-01-01         yes 1998-02-01 1997-01-01            10             4             2

I want:

#   id test_result exit_result_1 exit_result_2 exit_result_3 date_name      dates
# 1  1         yes            NA            NA            NA test_date 1998-01-01
# 2  1        <NA>            10             4             2 exit_date 1998-02-01
# 3  1        <NA>            NA            NA            NA  dob_date 1997-01-01

Something along the lines of

pivot_longer(df, cols = matches('date'), 
             names_to = "date_name", values_to = "dates")

but I want to return NAs. Any advice? Thanks

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
user63230
  • 4,095
  • 21
  • 43
  • related `names_pattern` examples [here](https://stackoverflow.com/questions/72071594/r-pivot-longer-with-stub-names-and-by-last-underscore), [here](https://cran.r-project.org/web/packages/tidyr/vignettes/pivot.html) and [here](https://dcl-wrangle.stanford.edu/pivot-advanced.html) – user63230 Jan 16 '23 at 14:39

2 Answers2

3

What you're asking for is outside the intent of pivoting/reshaping operations. The result

pivot_longer(df, cols = matches('date'), names_to = "date_name", values_to = "dates")
# # A tibble: 3 × 7
#      id test_result exit_result_1 exit_result_2 exit_result_3 date_name dates     
#   <dbl> <chr>               <dbl>         <dbl>         <dbl> <chr>     <date>    
# 1     1 yes                    10             4             2 test_date 1998-01-01
# 2     1 yes                    10             4             2 exit_date 1998-02-01
# 3     1 yes                    10             4             2 dob_date  1997-01-01

is correct in that the non-date values are carried down for every observation in the original frame.

What you need is to change the variables based on date_name, I think.

pivot_longer(df, cols = matches('date'),
             names_to = "date_name", values_to = "dates") %>%
  mutate(
    test_result = if_else(date_name == "test_date", test_result, test_result[NA]),
    across(starts_with("exit"),
           ~ if_else(date_name == "exit_date", ., .[NA]))
  )
# # A tibble: 3 × 7
#      id test_result exit_result_1 exit_result_2 exit_result_3 date_name dates     
#   <dbl> <chr>               <dbl>         <dbl>         <dbl> <chr>     <date>    
# 1     1 yes                    NA            NA            NA test_date 1998-01-01
# 2     1 NA                     10             4             2 exit_date 1998-02-01
# 3     1 NA                     NA            NA            NA dob_date  1997-01-01

I use var[NA] instead of NA_character_ to get the correct class of NA values, since otherwise if_else complains

! `false` must be a character vector, not a logical vector.
r2evans
  • 141,215
  • 6
  • 77
  • 149
3

You can set ".value" in names_to to indicate the corresponding component of the column name that defines the name of the output column. In this case one of names_sep or names_pattern must be supplied to specify how the column names should be split.

df %>%
  pivot_longer(
    -id,
    names_to = c("date_name", ".value"),
    names_pattern = "([^_]+)_(.+)"
  )

# # A tibble: 3 × 7
#      id date_name date       result result_1 result_2 result_3
#   <dbl> <chr>     <date>     <chr>     <dbl>    <dbl>    <dbl>
# 1     1 test      1998-01-01 yes          NA       NA       NA
# 2     1 exit      1998-02-01 NA           10        4        2
# 3     1 dob       1997-01-01 NA           NA       NA       NA

For comparison, the original dataset looks like

df
#   id  test_date test_result  exit_date   dob_date exit_result_1 exit_result_2 exit_result_3
# 1  1 1998-01-01         yes 1998-02-01 1997-01-01            10             4             2
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
  • 2
    Nice work, I hadn't considered the pattern use here. – r2evans Jan 11 '23 at 14:51
  • Thanks! This is what I wanted although I don't fully see how its achieved. So the regex `([^_]+)_(.+)` is just splitting up all the variables (minus the `id` column) by the first word and the remaining string (separated by the first `_`)? `names_to = c("date_name", ".value")` then takes what we want to pivot as being just the first part of the split of the regex? – user63230 Jan 11 '23 at 16:18
  • 1
    @user63230 Yes I think you realize it correctly. Note that with `names_pattern` you need to use two pairs of parentheses to define groups, one for `date_name` and one for `".value"`. – Darren Tsai Jan 11 '23 at 17:44