I have a data frame similar to the example below, just much larger:
current_data <- data.frame(Sampling_dates = c(rep("Date1",2), rep("Date2",3), rep("Date3",3)),
rRNA_tag = c("AA", "AC", "AG", "AA", "UU", "AA", "AG", "CC"),
Rel_abundance = c(50, 50, 40, 25, 35, 20, 50, 30))
I want to manipulate my much larger data frame into the below structure:
goal <- data.frame(rRNA_tag = c("AA", "AC", "AG", "UU", "CC"),
Date1 = c(50, 50, NA, NA, NA),
Date2 = c(25, NA, 40, 35, NA),
Date3 = c(20, NA, 50, NA, 30))
To state my goal another way, I want the first column of the starting data frame to become the top row of the result, but I need r to treat the repeated entries in column 1 of "current_data"(start) as a single entry in "goal"(desired result).
I also then want the rRNA_tag column to shift from the second column in "current data" to the first column in "goal" and like before I need r to treat the repeated entries in column 2 of "current data" as a single entry in "goal".
Then, I need the relative abundance values to be correctly assigned and NA or 0 to be assigned to the sampling times where some of the rRNA tags were not observed.
For additional context, my real data set has 12 sampling dates (but 39,000 entries - i.e. date1 would be repeated 5000 times, date2 4000, etc. through all 12 to add up to 39,000) and nearly 10,000 distinct rRNA tag sequences.
I tried the following:
goal_attempt <- setNames(data.frame(t(current_data[ , - 1])), current_data[ , 1])
I understand now why this didn't work - I just transposed my 39,000 entries creating a 3 x 39,000 data frame - but I'm really unsure what to do next.