0

I would like to rearrange my data frame from long to wide, with 'na's inserted where there is no information for a given variable.

With a data frame in R that looks like this:

fish_df1 <- read.table(header=T, sep=",", text="
species,st,var01,var02,var03,var04
cod,1,45,52.3,1001.0,A
cod,1,23,45.6,2003.0,D
cod,1,51,33.2,5003.0,B
cod,1,62,12.3,23.0,E
cod,10,35,65.7,123.0,C
cod,10,86,39.87,90.0,A
cod,10,46,39.22,908.0,A
cod,10,37,38.57,99.0,A
cod,14,86,37.92,12242.7,B
cod,14,36,37.27,16697.7,A
cod,14,66,36.62,1203.7,C
cod,14,72,35.97,1465.9,B
herring,1,75,35.32,1728.1,A
herring,1,78,34.67,1990.3,A
herring,1,99,34.02,2252.5,B
herring,1,49,33.37,908.0,B
herring,10,51,32.72,99.0,C
herring,10,32,32.07,93.4,C
herring,10,35,31.42,808.0,A
herring,10,21,30.77,1230.0,A
herring,12,32,30.12,4560.0,A
herring,12,45,29.47,5951.3,A
herring,12,36,28.82,7827.3,A
herring,12,35,28.17,9703.3,C
eel,1,88,35.32,1728.1,A
eel,1,66,34.67,1990.3,A
eel,1,74,34.02,2252.5,B
eel,1,37,33.37,908.0,B
eel,10,78,30.12,4560.0,A
eel,10,89,29.47,5951.3,A
eel,10,27,28.82,7827.3,A
eel,10,34,28.17,9703.3,C
")

I would like to re-arrange the data frame to make it look like this:

fish_df2 <- read.table(header=T, sep=",", text="
species.cod,st.cod,var01.cod,var02.cod,var03.cod,var04.cod,species.herring,st.herring,var01.herring,var02.herring,var03.herring,var04.herring,species.eel,st.eel,var01.eel,var02.eel,var03.eel,var04.eel
cod,1,45,52.3,1001.0,A,herring,1,75,35.32,1728.1,A,eel,1,88,35.32,1728.1,A
cod,1,23,45.6,2003.0,D,herring,1,78,34.67,1990.3,A,eel,1,66,34.67,1990.3,A
cod,1,51,33.2,5003.0,B,herring,1,99,34.02,2252.5,B,eel,1,74,34.02,2252.5,B
cod,1,62,12.3,23.0,E,herring,1,49,33.37,908.0,B,eel,1,37,33.37,908.0,B
cod,10,35,65.7,123.0,C,herring,10,51,32.72,99.0,C,eel,10,78,30.12,4560.0,A
cod,10,86,39.87,90.0,A,herring,10,32,32.07,93.4,C,eel,10,89,29.47,5951.3,A
cod,10,46,39.22,908.0,A,herring,10,35,31.42,808.0,A,eel,10,27,28.82,7827.3,A
cod,10,37,38.57,99.0,A,herring,10,21,30.77,1230.0,A,eel,10,34,28.17,9703.3,C
cod,12,NA,NA,NA,NA,herring,12,32,30.12,4560.0,A,eel,12,NA,NA,NA,NA
cod,12,NA,NA,NA,NA,herring,12,45,29.47,5951.3,A,eel,12,NA,NA,NA,NA
cod,12,NA,NA,NA,NA,herring,12,36,28.82,7827.3,A,eel,12,NA,NA,NA,NA
cod,12,NA,NA,NA,NA,herring,12,35,28.17,9703.3,C,eel,12,NA,NA,NA,NA
cod,14,86,37.92,12242.7,B,herring,14,NA,NA,NA,NA,eel,14,NA,NA,NA,NA
cod,14,36,37.27,16697.7,A,herring,14,NA,NA,NA,NA,eel,14,NA,NA,NA,NA
cod,14,66,36.62,1203.7,C,herring,14,NA,NA,NA,NA,eel,14,NA,NA,NA,NA
cod,14,72,35.97,1465.9,B,herring,14,NA,NA,NA,NA,eel,14,NA,NA,NA,NA
")

I have tried using the reshape function, like this:

fish_df3 <- reshape(fish_df1, v.names = "var01", idvar = "st",
                timevar = "species", 
                direction = "wide")

I think I need more variables for the idvar part of the reshape function.

I apologize for not making this question more specific, but I think my question is related to the arguments I need to set in the function 'reshape', and adding the right name of the argument to the title of my question might make my question more specific. Thanks in advance for any help on this

swk
  • 13
  • 2
  • To me it looks more like a subset-and-merge problem, but I can't find any more commonality across rows than `st.cod == st.herring == st.eel`. Typically when converting from long-to-wide, your "common element(s)" that is repeated multiple times in the long format turn into a single row in the wide format, and that is not happening here. Does order matter *within* the `st.*` variables? – r2evans May 31 '17 at 20:21
  • The st.* variable is the common element , yes. The st.* variable is a always occurs in multiple of fours . The order in which the st-numbers for each species is occurring is important. Each set of four st-numbers for each species could in fact be numbered - e.g. 1.1, 1.2, 1.3, 1.4. But I could not work out a solution for adding such a number to the st-variable. I would prefer of the actual output keeps the order of the st-numbers going from the lowest st number to the highest. – swk May 31 '17 at 20:40
  • I was not sure whether the 'reshape' function in fact was the best choice, but searching the internet it brought me closer to solving my problem. I am happy to use whatever tool or other function that might be better. – swk May 31 '17 at 20:42

0 Answers0