Apologies for the first nondescript post, this is my first go at this. I have amended it and I hope it's easier to understand now.
I've been trying to reshape several large datasets from long to wide format then merge them together using a common identifier.
The data is fisheries data. Currently I need to convert the target catch data from long to wide and append to a dataset that has been formatted to have each row as a single 'TripSet' value that is the common identifier throughout my datasets.
The format of the dataset that I need to convert from long to wide is as follows:
my.df <- data.frame(Trip=rep(c("A","B","C"), 5), Set=rep(1:5, each=3), CommonName=rep
(c("i","j","k"),5),TotCat=1:15, Kept=16:30, RtnAlive=31:45, RtnDead= 46:60, RtnUnk= 61:75)
my.df$TripSet <- paste(my.df$Trip,my.df$Set)
my.df
Trip Set CommonName TotCat Kept RtnAlive RtnDead RtnUnk TripSet
1 A 1 i 1 16 31 46 61 A 1
2 B 1 j 2 17 32 47 62 B 1
3 C 1 k 3 18 33 48 63 C 1
4 A 2 i 4 19 34 49 64 A 2
5 B 2 j 5 20 35 50 65 B 2
6 C 2 k 6 21 36 51 66 C 2
7 A 3 i 7 22 37 52 67 A 3
8 B 3 j 8 23 38 53 68 B 3
9 C 3 k 9 24 39 54 69 C 3
10 A 4 i 10 25 40 55 70 A 4
11 B 4 j 11 26 41 56 71 B 4
12 C 4 k 12 27 42 57 72 C 4
13 A 5 i 13 28 43 58 73 A 5
14 B 5 j 14 29 44 59 74 B 5
15 C 5 k 15 30 45 60 75 C 5
I can use cast or dcast to convert from long to wide with one variable as follows:
cast(my.df, TripSet ~ CommonName, value="TotCat")
TripSet i j k
1 A 1 1 NA NA
2 A 2 4 NA NA
3 A 3 7 NA NA
4 A 4 10 NA NA
5 A 5 13 NA NA
6 B 1 NA 2 NA
7 B 2 NA 5 NA
8 B 3 NA 8 NA
9 B 4 NA 11 NA
10 B 5 NA 14 NA
11 C 1 NA NA 3
12 C 2 NA NA 6
13 C 3 NA NA 9
14 C 4 NA NA 12
15 C 5 NA NA 15
However I am unsure if I should be getting so many NA values rather than zeros here.
My desired data format is having the common identifier 'TripSet by row then in columns have each species by Common Name matched with 'TotCat','Kept','RtnAlive','RtnDead','RtnUnk'.
I tried melting then recasting the data as follows:
dcast(melt(my.df, id.vars=c("TripSet", "CommonName")), TripSet~variable+CommonName)
TripSet Trip_i Trip_j Trip_k Set_i Set_j Set_k TotCat_i TotCat_j TotCat_k Kept_i Kept_j Kept_k RtnAlive_i RtnAlive_j RtnAlive_k RtnDead_i
1 A 1 A <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 A 2 A <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 A 3 A <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 A 4 A <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 A 5 A <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 B 1 <NA> B <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 B 2 <NA> B <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 B 3 <NA> B <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 B 4 <NA> B <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 B 5 <NA> B <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
11 C 1 <NA> <NA> C <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
12 C 2 <NA> <NA> C <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
13 C 3 <NA> <NA> C <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
14 C 4 <NA> <NA> C <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
15 C 5 <NA> <NA> C <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
This is essentially my expected output but I don't wish to cast 'Trip' and 'Set' (and several other column variables I have in the actual dataset), rather only take 'Kept','RtnAlive','RtnDead','RtnUnk' as additional measure variables, and have the column names reflect both the selected measure variable name and the CommonName.
As an aside, are the NA values being generated as there is no data, so for example, no commonname i species were kept or rtndead on x number of sets?
Any help with this would be appreciated.