3

First time posting something here, forgive any missteps in my question.

In my example below I've got a data.frame where the unique identifier is the tripID with the name of the vessel, the species code, and a catch metric.

> testFrame1 <- data.frame('tripID' = c(1,1,2,2,3,4,5), 
                           'name' = c('SS Anne','SS Anne', 'HMS Endurance', 'HMS Endurance','Salty Hippo', 'Seagallop', 'Borealis'), 
                           'SPP' = c(101,201,101,201,102,102,103), 
                           'kept' = c(12, 22, 14, 24, 16, 18, 10))
> testFrame1
    tripID          name SPP kept
  1      1       SS Anne 101   12
  2      1       SS Anne 201   22
  3      2 HMS Endurance 101   14
  4      2 HMS Endurance 201   24
  5      3   Salty Hippo 102   16
  6      4     Seagallop 102   18
  7      5      Borealis 103   10

I need a way to basically condense the data.frame so that all there is only one row per tripID as shown below.

> testFrame1
    tripID          name SPP kept SPP.1 kept.1
  1      1       SS Anne 101   12   201     22
  2      2 HMS Endurance 101   14   201     24
  3      3   Salty Hippo 102   16    NA     NA
  4      4     Seagallop 102   18    NA     NA
  5      5      Borealis 103   10    NA     NA

I've looked into tidyr and reshape but neither of those are can deliver quite what I'm asking for. Is there anything out there that does this quasi-reshaping?

2 Answers2

1

Here are two alternatives using base::reshape and data.table::dcast:

1) base R

reshape(transform(testFrame1,
                  timevar = ave(tripID, tripID, FUN = seq_along)),
        idvar = cbind("tripID", "name"),
        timevar = "timevar",
        direction = "wide")
#  tripID          name SPP.1 kept.1 SPP.2 kept.2
#1      1       SS Anne   101     12   201     22
#3      2 HMS Endurance   101     14   201     24
#5      3   Salty Hippo   102     16    NA     NA
#6      4     Seagallop   102     18    NA     NA
#7      5      Borealis   103     10    NA     NA

2) data.table

library(data.table)
setDT(testFrame1)
dcast(testFrame1, tripID + name ~ rowid(tripID), value.var = c("SPP", "kept"))
#   tripID          name SPP_1 SPP_2 kept_1 kept_2
#1:      1       SS Anne   101   201     12     22
#2:      2 HMS Endurance   101   201     14     24
#3:      3   Salty Hippo   102    NA     16     NA
#4:      4     Seagallop   102    NA     18     NA
#5:      5      Borealis   103    NA     10     NA
markus
  • 25,843
  • 5
  • 39
  • 58
  • So I'm using the `base::reshape` function and I'm getting an error indicating `Error: cannot allocate vector of size 54.0 Gb In addition: Warning messages: 1: In ans * length(l) : NAs produced by integer overflow 2: In ans * length(l) + if1 : NAs produced by integer overflow` My production data.frame that I'm using contains 388994 observations in 35 variables (36 including the `timevar`). Are there limitations to `reshape` that are going to prevent me from utilizing it for my task? – user10598026 Nov 09 '18 at 14:53
  • @user10598026 I don't really know. Does this post help you: https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb If not working, then try `data.table` which is fast and memory efficient. – markus Nov 10 '18 at 21:22
  • I decided to use the data.table function. It's much more efficient. Thanks for the help! Now I just have to figure out what to do with my terrible terrible data. – user10598026 Nov 30 '18 at 14:47
0

Great reproducible post considering it's your first. Here's a way to do it with dplyr and tidyr -

testFrame1 %>%
  group_by(tripID, name) %>%
  summarise(
    SPP = toString(SPP),
    kept = toString(kept)
  ) %>%
  ungroup() %>%
  separate("SPP", into = c("SPP", "SPP.1"), sep = ", ", extra = "drop", fill = "right") %>%
  separate("kept", into = c("kept", "kept.1"), sep = ", ", extra = "drop", fill = "right")

# A tibble: 5 x 6
  tripID name          SPP   SPP.1 kept  kept.1
   <dbl> <chr>         <chr> <chr> <chr> <chr> 
1   1.00 SS Anne       101   201   12    22    
2   2.00 HMS Endurance 101   201   14    24    
3   3.00 Salty Hippo   102   <NA>  16    <NA>  
4   4.00 Seagallop     102   <NA>  18    <NA>  
5   5.00 Borealis      103   <NA>  10    <NA>
Shree
  • 10,835
  • 1
  • 14
  • 36