0

I have a wide dataset that looks like this:

dataset <- data.frame(id = c(1, 2, 3, 4, 5),
                      basketball.time1 = c(2, 5, 4, 3, 3),
                      basketball.time2 = c(3, 4, 5, 3, 2),
                      basketball.time3 = c(1, 8, 4, 3, 1),
                      volleyball.time1 = c(2, 3, 4, 0, 1),
                      volleyball.time2 = c(3, 4, 3, 1, 3),
                      volleyball.time3 = c(1, 8, 12, 2, 3))

What I want is the dataset in long format, with id, time, basketball, and volleyball as separate variables. I want to create the time column with three factors (time1, time2, and time3) using the string that is separated by "." at the end of the basketball and volleyball columns.

Thanks so much!

EDIT: fixed typo

Drew
  • 563
  • 2
  • 8
  • Please share how your expected output would look like. – deschen Dec 30 '21 at 16:30
  • You have a misspelling in the last column "vollyeball.time3" that's throwing off people's results – camille Dec 30 '21 at 16:39
  • I think there's a typo in the question `'volleyball' != 'vollyeball'` – Dan Adams Dec 30 '21 at 16:41
  • Thanks yes there is a typo, sorry – Drew Dec 30 '21 at 16:49
  • 1
    FYI each of the posts I marked up top includes a one-line answer with `tidyr::pivot_longer` using the `".value"` special keyword – camille Dec 30 '21 at 17:00
  • Thanks, @camille. Please let us know the reason for the serial downvote to all answers (in case it is yours). Is it because we missed the .values answer or because we answered instead of voting for closure? Thanks in advance. – GuedesBF Dec 30 '21 at 19:39
  • @GuedesBF that wasn't me, although whether or not to downvote answers to questions that have been asked repeatedly is [an ongoing debate](https://meta.stackoverflow.com/q/253735/5325862) – camille Dec 30 '21 at 20:01
  • Ok, thank you, @camille. The .value advice is excellent. – GuedesBF Dec 30 '21 at 20:19

3 Answers3

1

A possible solution:

library(tidyverse)

dataset <- data.frame(id = c(1, 2, 3, 4, 5),
                      basketball.time1 = c(2, 5, 4, 3, 3),
                      basketball.time2 = c(3, 4, 5, 3, 2),
                      basketball.time3 = c(1, 8, 4, 3, 1),
                      volleyball.time1 = c(2, 3, 4, 0, 1),
                      volleyball.time2 = c(3, 4, 3, 1, 3),
                      vollyeball.time3 = c(1, 8, 12, 2, 3))

dataset %>% 
  pivot_longer(cols = -id) %>% 
  separate(name,into = c("name", "time")) %>% 
  pivot_wider(id_cols = c(id, name, time))

#> # A tibble: 15 × 5
#>       id time  basketball volleyball vollyeball
#>    <dbl> <chr>      <dbl>      <dbl>      <dbl>
#>  1     1 time1          2          2         NA
#>  2     1 time2          3          3         NA
#>  3     1 time3          1         NA          1
#>  4     2 time1          5          3         NA
#>  5     2 time2          4          4         NA
#>  6     2 time3          8         NA          8
#>  7     3 time1          4          4         NA
#>  8     3 time2          5          3         NA
#>  9     3 time3          4         NA         12
#> 10     4 time1          3          0         NA
#> 11     4 time2          3          1         NA
#> 12     4 time3          3         NA          2
#> 13     5 time1          3          1         NA
#> 14     5 time2          2          3         NA
#> 15     5 time3          1         NA          3
PaulS
  • 21,159
  • 2
  • 9
  • 26
1
  1. pivot_longer
  2. separate in sport and time column
  3. pivot_wider sport column
library(dplyr)
library(tidyr)

dataset %>% 
  pivot_longer(
    -id
  ) %>% 
  separate(name, c("sport", "time")) %>% 
  pivot_wider(
    names_from = sport
  )

      id time  basketball volleyball vollyeball
   <dbl> <chr>      <dbl>      <dbl>      <dbl>
 1     1 time1          2          2         NA
 2     1 time2          3          3         NA
 3     1 time3          1         NA          1
 4     2 time1          5          3         NA
 5     2 time2          4          4         NA
 6     2 time3          8         NA          8
 7     3 time1          4          4         NA
 8     3 time2          5          3         NA
 9     3 time3          4         NA         12
10     4 time1          3          0         NA
11     4 time2          3          1         NA
12     4 time3          3         NA          2
13     5 time1          3          1         NA
14     5 time2          2          3         NA
15     5 time3          1         NA          3
TarJae
  • 72,363
  • 6
  • 19
  • 66
1

We can usepivor_longer %>% pivot_wider. separateis not needed if we set the appropriate parameters to pivor_longer.

library(tidyr)

dataset %>%
        pivot_longer(cols = matches('time\\d+$'), names_to = c('sport', 'time'), names_pattern = '(.*)\\.(.*)') %>%
        pivot_wider(names_from = sport, values_from = value)

# A tibble: 15 × 5
      id time  basketball volleyball vollyeball
   <dbl> <chr>      <dbl>      <dbl>      <dbl>
 1     1 time1          2          2         NA
 2     1 time2          3          3         NA
 3     1 time3          1         NA          1
 4     2 time1          5          3         NA
 5     2 time2          4          4         NA
 6     2 time3          8         NA          8
 7     3 time1          4          4         NA
 8     3 time2          5          3         NA
 9     3 time3          4         NA         12
10     4 time1          3          0         NA
11     4 time2          3          1         NA
12     4 time3          3         NA          2
13     5 time1          3          1         NA
14     5 time2          2          3         NA
15     5 time3          1         NA          3
GuedesBF
  • 8,409
  • 5
  • 19
  • 37