Convert data from long format to wide format with five different measure columns

Question

Apologies for the first nondescript post, this is my first go at this. I have amended it and I hope it's easier to understand now.

I've been trying to reshape several large datasets from long to wide format then merge them together using a common identifier.

The data is fisheries data. Currently I need to convert the target catch data from long to wide and append to a dataset that has been formatted to have each row as a single 'TripSet' value that is the common identifier throughout my datasets.

The format of the dataset that I need to convert from long to wide is as follows:

my.df <- data.frame(Trip=rep(c("A","B","C"), 5), Set=rep(1:5, each=3), CommonName=rep
(c("i","j","k"),5),TotCat=1:15, Kept=16:30, RtnAlive=31:45, RtnDead= 46:60, RtnUnk= 61:75)

my.df$TripSet <- paste(my.df$Trip,my.df$Set)

my.df
   Trip Set CommonName TotCat Kept RtnAlive RtnDead RtnUnk TripSet
1     A   1          i      1   16       31      46     61     A 1
2     B   1          j      2   17       32      47     62     B 1
3     C   1          k      3   18       33      48     63     C 1
4     A   2          i      4   19       34      49     64     A 2
5     B   2          j      5   20       35      50     65     B 2
6     C   2          k      6   21       36      51     66     C 2
7     A   3          i      7   22       37      52     67     A 3
8     B   3          j      8   23       38      53     68     B 3
9     C   3          k      9   24       39      54     69     C 3
10    A   4          i     10   25       40      55     70     A 4
11    B   4          j     11   26       41      56     71     B 4
12    C   4          k     12   27       42      57     72     C 4
13    A   5          i     13   28       43      58     73     A 5
14    B   5          j     14   29       44      59     74     B 5
15    C   5          k     15   30       45      60     75     C 5

I can use cast or dcast to convert from long to wide with one variable as follows:

cast(my.df, TripSet ~ CommonName, value="TotCat")

   TripSet  i  j  k
1      A 1  1 NA NA
2      A 2  4 NA NA
3      A 3  7 NA NA
4      A 4 10 NA NA
5      A 5 13 NA NA
6      B 1 NA  2 NA
7      B 2 NA  5 NA
8      B 3 NA  8 NA
9      B 4 NA 11 NA
10     B 5 NA 14 NA
11     C 1 NA NA  3
12     C 2 NA NA  6
13     C 3 NA NA  9
14     C 4 NA NA 12
15     C 5 NA NA 15

However I am unsure if I should be getting so many NA values rather than zeros here.

My desired data format is having the common identifier 'TripSet by row then in columns have each species by Common Name matched with 'TotCat','Kept','RtnAlive','RtnDead','RtnUnk'.

I tried melting then recasting the data as follows:

dcast(melt(my.df, id.vars=c("TripSet", "CommonName")), TripSet~variable+CommonName)

   TripSet Trip_i Trip_j Trip_k Set_i Set_j Set_k TotCat_i TotCat_j TotCat_k Kept_i Kept_j Kept_k RtnAlive_i RtnAlive_j RtnAlive_k RtnDead_i
1      A 1      A   <NA>   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
2      A 2      A   <NA>   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
3      A 3      A   <NA>   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
4      A 4      A   <NA>   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
5      A 5      A   <NA>   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
6      B 1   <NA>      B   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
7      B 2   <NA>      B   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
8      B 3   <NA>      B   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
9      B 4   <NA>      B   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
10     B 5   <NA>      B   <NA>  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
11     C 1   <NA>   <NA>      C  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
12     C 2   <NA>   <NA>      C  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
13     C 3   <NA>   <NA>      C  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
14     C 4   <NA>   <NA>      C  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>
15     C 5   <NA>   <NA>      C  <NA>  <NA>  <NA>     <NA>     <NA>     <NA>   <NA>   <NA>   <NA>       <NA>       <NA>       <NA>      <NA>

This is essentially my expected output but I don't wish to cast 'Trip' and 'Set' (and several other column variables I have in the actual dataset), rather only take 'Kept','RtnAlive','RtnDead','RtnUnk' as additional measure variables, and have the column names reflect both the selected measure variable name and the CommonName.

As an aside, are the NA values being generated as there is no data, so for example, no commonname i species were kept or rtndead on x number of sets?

Any help with this would be appreciated.

Please provide a reproducible example to illustrate your problem and for us to assist you in your problem. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Adam Quek, Apr 13 '17 at 10:24
Yes - NAs are for the missing combinations, BUT my results differ - see my answer below. — Andrew Lavers, Apr 13 '17 at 12:17

Andrew Lavers · Answer 1 · 2017-04-13T23:51:13.713

library(dplyr)
library(reshape2)
my.df %>% 
  melt(id.vars = c("Trip", "Set", "CommonName")) %>%      # make tall
  dcast(Trip + Set ~ variable + CommonName)                # make wide  

#    Trip Set TotCat_i TotCat_j TotCat_k Kept_i Kept_j Kept_k RtnAlive_i RtnAlive_j RtnAlive_k RtnDead_i
# 1     A   1        1       NA       NA     16     NA     NA         31         NA         NA        46
# 2     A   2        4       NA       NA     19     NA     NA         34         NA         NA        49
# 3     A   3        7       NA       NA     22     NA     NA         37         NA         NA        52
# 4     A   4       10       NA       NA     25     NA     NA         40         NA         NA        55
# 5     A   5       13       NA       NA     28     NA     NA         43         NA         NA        58
# 6     B   1       NA        2       NA     NA     17     NA         NA         32         NA        NA
# 7     B   2       NA        5       NA     NA     20     NA         NA         35         NA        NA
# 8     B   3       NA        8       NA     NA     23     NA         NA         38         NA        NA
# 9     B   4       NA       11       NA     NA     26     NA         NA         41         NA        NA
# 10    B   5       NA       14       NA     NA     29     NA         NA         44         NA        NA
# 11    C   1       NA       NA        3     NA     NA     18         NA         NA         33        NA
# 12    C   2       NA       NA        6     NA     NA     21         NA         NA         36        NA
# 13    C   3       NA       NA        9     NA     NA     24         NA         NA         39        NA
# 14    C   4       NA       NA       12     NA     NA     27         NA         NA         42        NA
# 15    C   5       NA       NA       15     NA     NA     30         NA         NA         45        NA

Hi epi99, thanks so much for the help. That is my desired output. One trouble shooting question. When I run the above code you just posted I get the following error message Error in select(., -Trip, -Set) : unused arguments (-Trip, -Set) — W. Arlidge, Apr 13 '17 at 12:19
I just simplified it .. using `id.vars` parameter in melt to select the columns — Andrew Lavers, Apr 13 '17 at 12:25
Thanks. I have a question, is it possible to reshape using dplyr so that all the columns are in order of each species by common name for total catch, then each species by common name for kept and so on... rather than species 1 with total catch, kept, rtndead, rtnalive and so on... — W. Arlidge, Apr 13 '17 at 23:35
and BTW, if you prefer zeroes instead of NA you can add `fill = 0` to the `dcast` — Andrew Lavers, Apr 14 '17 at 12:57

Convert data from long format to wide format with five different measure columns

1 Answers1