2

I have a data frame that is not really in a 'long form' but it is in a longer form than I would like. I would like to condense it into a 'wide form' that has all the information associated with an id into one line. Right now, some of the information is repeated on each line (like the date in the example below) and other information needs to be preserved when the lines are consolidated (like type column below). thanks!

id <- c(1000, 1000, 1000, 1001, 1001, 1001)
type <- c("A", "B", "B", "C", "C", "A")
dates <- c("10/5/2019", "10/5/2019", "10/5/2019", "9/17/2020", "9/17/2020", "9/17/2020")
df <- as.data.frame(cbind(id, type, dates))
df
    id type     dates
1 1000    A 10/5/2019
2 1000    B 10/5/2019
3 1000    B 10/5/2019
4 1001    C 9/17/2020
5 1001    C 9/17/2020
6 1001    A 9/17/2020

I would like it to looks like this:

enter image description here

DanY
  • 5,920
  • 1
  • 13
  • 33
user3390169
  • 1,015
  • 1
  • 11
  • 34

3 Answers3

4

Another option only using tidyverse:

library(tidyverse)
#Code
df %>% group_by(id) %>% mutate(idv=paste0('type.',1:n())) %>%
  pivot_wider(names_from = idv,values_from=type)

Output:

# A tibble: 2 x 5
# Groups:   id [2]
  id    dates     type.1 type.2 type.3
  <chr> <chr>     <chr>  <chr>  <chr> 
1 1000  10/5/2019 A      B      B     
2 1001  9/17/2020 C      C      A   

Or using row_number() (credits to @r2evans):

#Code 2
df %>% group_by(id) %>% mutate(idv=paste0('type.',row_number())) %>%
  pivot_wider(names_from = idv,values_from=type)

Output:

# A tibble: 2 x 5
# Groups:   id [2]
  id    dates     type.1 type.2 type.3
  <chr> <chr>     <chr>  <chr>  <chr> 
1 1000  10/5/2019 A      B      B     
2 1001  9/17/2020 C      C      A     
Duck
  • 39,058
  • 13
  • 42
  • 84
  • I'd recommend `row_number()` instead of `1:n()`. – r2evans Oct 05 '20 at 22:12
  • @r2evans Let me update now. It is because of the practica of using `1:n()` :) – Duck Oct 05 '20 at 22:12
  • It's actually because automating `1:n` has bitten me, I now favor things like `seq_len(.)` and `row_number()`, since they behave sanely in the presence of 0 rows. Granted, that shouldn't happen here, but still ... habits :-) – r2evans Oct 05 '20 at 22:14
  • 1
    @r2evans For sure, you are right. Updated and credited :) – Duck Oct 05 '20 at 22:15
  • 1
    @r2evans The `rowid()` function from `data.table` has similar effect to `row_number()` or `cur_group_id()` but it can group multiple vars so that you can even reshape data to wide having duplicated rows so the issue of having lists in the reshaped data will dissappear! – Duck Oct 05 '20 at 22:25
  • 1
    So `rn = rowid(...)` is different than `group_by(...) %>% mutate(rn = row_number())`? – r2evans Oct 05 '20 at 22:28
  • 1
    @r2evans Yes, the `data.table` function creates aun unique id per row and even having duplicated rows you can assign like `rowid(var1,var2)` in order to have a distinct ide for the rows. Actually `df %>% group_by(id) %>% mutate(ID=row_number())` and `df %>% group_by(id) %>% mutate(ID=row_number())` produce same result. But the data.table function has more advanced features for ids in rows. – Duck Oct 05 '20 at 22:33
3

Here is a base R option using reshape

reshape(
  within(df, num <- ave(1:nrow(df), id, FUN = seq_along)),
  direction = "wide",
  idvar = c("id", "dates"),
  timevar = "num"
)

which gives

    id     dates type.1 type.2 type.3
1 1000 10/5/2019      A      B      B
4 1001 9/17/2020      C      C      A
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • arcane use of `reshape`, nice ... do you find it has any capability (besides universal availabilty) that `tidyr::pivot_*` and/or `data.table::melt`/`::dcast` do not? – r2evans Oct 05 '20 at 22:15
  • 1
    @r2evans Sorry that I have no experience in `pivot_*` or `data.table::melt/::dcast`, I almost use base R only ... – ThomasIsCoding Oct 05 '20 at 22:20
  • I liked this use of base R but it gave me a memory error when I used it on my actual data set. – user3390169 Oct 06 '20 at 03:47
2

We can use pivot_wider to reshape from 'long' to 'wide' after creating a sequence column with rowid (from data.table)

library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
df %>%
    mutate(rn = str_c('type.', rowid(id))) %>% 
    pivot_wider(names_from = rn, values_from = type)

-output

# A tibble: 2 x 5
#  id    dates     type.1 type.2 type.3
#  <chr> <chr>     <chr>  <chr>  <chr> 
#1 1000  10/5/2019 A      B      B     
#2 1001  9/17/2020 C      C      A     

Or only using tidyverse

library(dplyr)
library(tidyr)
library(stringr)
df %>%
   group_by(id) %>%
   mutate(rn = str_c('type.', row_number())) %>%
   pivot_wider(names_from = rn, values_from = type)

Or using data.table in a compact way

library(data.table)
dcast(setDT(df), id + dates ~ paste0('type.', rowid(id)), value.var = 'type')

-output

#     id     dates type.1 type.2 type.3
#1: 1000 10/5/2019      A      B      B
#2: 1001 9/17/2020      C      C      A
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Does `data.table::rowid` provide something that `group_by(id) %>% mutate(... row_number())` doesn't? Just curious ... I am finding intersections of `dplyr` and `data.table` to be interesting. – r2evans Oct 05 '20 at 22:12
  • 1
    I generally use the `rowid` to make answers compact. I haven't checked the efficiency of group_by + row_number vs `rowid`. So, no, there is no deeper meaning in that usage. – akrun Oct 06 '20 at 01:08