0

I have a data frame similar to the example below, just much larger:

current_data <- data.frame(Sampling_dates = c(rep("Date1",2), rep("Date2",3), rep("Date3",3)), 
                   rRNA_tag = c("AA", "AC", "AG", "AA", "UU", "AA", "AG", "CC"),
                   Rel_abundance = c(50, 50, 40, 25, 35, 20, 50, 30))

I want to manipulate my much larger data frame into the below structure:

goal <- data.frame(rRNA_tag = c("AA", "AC", "AG", "UU", "CC"),
                   Date1 = c(50, 50, NA, NA, NA),
                   Date2 = c(25, NA, 40, 35, NA),
                   Date3 = c(20, NA, 50, NA, 30))

To state my goal another way, I want the first column of the starting data frame to become the top row of the result, but I need r to treat the repeated entries in column 1 of "current_data"(start) as a single entry in "goal"(desired result).

I also then want the rRNA_tag column to shift from the second column in "current data" to the first column in "goal" and like before I need r to treat the repeated entries in column 2 of "current data" as a single entry in "goal".

Then, I need the relative abundance values to be correctly assigned and NA or 0 to be assigned to the sampling times where some of the rRNA tags were not observed.

For additional context, my real data set has 12 sampling dates (but 39,000 entries - i.e. date1 would be repeated 5000 times, date2 4000, etc. through all 12 to add up to 39,000) and nearly 10,000 distinct rRNA tag sequences.

I tried the following:

goal_attempt <- setNames(data.frame(t(current_data[ , - 1])), current_data[ , 1])

I understand now why this didn't work - I just transposed my 39,000 entries creating a 3 x 39,000 data frame - but I'm really unsure what to do next.

  • 1
    This is called "pivoting" or "reshaping" data from a long format to a wide format - there are several nice tools for doing it. I like `tidyr::pivot_wider(current_data, names_from = Sampling_dates, values_from = Rel_abundance)` which creates your `goal` from the sample input. Though if your data is very large the `data.table::dcast()` function will be more efficient. You can find examples with both methods (and more) at the linked duplicate. – Gregor Thomas Nov 29 '22 at 20:40
  • Thank you, I had never heard of pivoting or reshaping before. tidyr::pivot_wider worked for my data! – Clayton Tracey Dec 01 '22 at 15:08

0 Answers0