0

Edited

I have a large table, which starts like this enter image description here

Essentially, it's a table with multiple samples ("samp_id") showing the number ("least") of "taxon" present in each.

I want to transpose/pivot the table to look like this;

enter image description here

i.e. with "taxon" as the top row, with each of the 90 samples in "data" following as a row based on the "least" column, re-named with its "samp_id". So you see what each sample is, as well the value in "least" for each sample in the different "taxon" (which may not be identical across the 90 samples).

Previously, I have separated the data into multiple tibbles based on "samp_id", selected "taxon" and "least", re-named "least" with the "samp_id" then combined the individual tibbles based on "taxon" with full_join using something like the code below, then transposing the combined table

ACLOD_11 = data %>%
  filter(samp_id == "ACLOD_11") %>%
  select(taxon, least) %>%
  rename("ACLOD_11" = least) 

ACLOD_12 = data ... #as above, but different samp_id

data_final = list(ACLOD_11, ACLOD_12, ...) %>% 
  reduce(full_join, by = "taxon")

As I have more data tables to follow after this one with 90 samples, so I want to be able to do this without having to individually separate the data into 100s of tibbles and manually inputting the "samp_id" before joining.

I have currently split the data into 90 separate tibbles based on "samp_id" (there are 90 samples in "data")

data_split = data %>%
  group_split(samp_id) 

but am unsure if this is the best way to do this, or what I should to next?

Maho
  • 59
  • 1
  • 7
  • 1
    This sounds a bit like a "long to wide" problem. Have you thought about using something like `tidyr::pivot_wider()` on the columns "samp_id", "least", and "taxon"? If things are very large and you need some speed you could try `data.table::dcast()` for widening. – aosmith Jun 07 '21 at 16:38
  • thanks for this, it's almost close; I've edited the question to (hopefully!) add more clarity, would appreciate your thoughs! – Maho Jun 07 '21 at 17:40
  • If you select just the three columns that are relevant to the problem (samp_id, least, taxon), does `tidyr::pivot_wider(data, names_from = "taxon", values_from = "least")` get you close? If you want folks to test things, try adding some data we can copy and paste instead of screenshots. :) Some ideas on how to do that [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – aosmith Jun 07 '21 at 19:54
  • that worked, thanks for your help! - yes, I understand that it's better to provide an example data so that others can test the code out :) – Maho Jun 08 '21 at 09:04

1 Answers1

0

We can use

library(dplyr)
library(purrr)
data %>%
     split(.$samp_id) %>%
     imap(~ .x %>%
              select(taxon, least) %>%
              rename(!!.y := least)) %>%
      reduce(full_join, by = 'taxon')
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks - I've tried this, but got the following error; is there a way around this? "Error: cannot allocate vector of size 1.0 Gb" – Maho Jun 07 '21 at 16:31
  • @Maho that is unrelated to the code. It is memory you don't have. Perhaps tyr on a new session or use a cluster with memory – akrun Jun 07 '21 at 16:33