0

First time posting. Apologies if I'm not as clear as I intend.

I have an excel (xlxs) spreadsheet of data; it's sequencing data if that helps. Generally indexed as follows: column 1 = organism families (hundreds of organisms down this column) columns 2-x = specific samples

Many of the boxes scattered throughout the data are zero values, or too low, which I want to omit. I set my data such that anything under 5 is set to an NA. Since different samples will have many more, less, or different species omitted by that threshold, I want to separate by samples. Code so far is:

#Files work, I just omitted my directories to place online
`my_counts <- read_excel("...Family_120821.xlsx" , sheet = "family_Counts") 
my_perc <- read_excel("...Family_120821.xlsx" , sheet = "family_Percentages")
my_counts[my_counts < 5] <- NA
my_counts
my_perc[my_perc < 0.05] <- NA
my_perc

S13 <- my_counts$family , my_counts$Sample.13
S13A <- na.omit(S13)
S13A


S14 <- my_counts$Sample.14
S14A <- na.omit(S14)
S14A

S15 <- my_counts$Sample.15
S15A <- na.omit(S15)
S15A


...

First question, there a better way I can go about this such that I can replicate it in different data without typing out each individual sample? Most important question: When I do this, I get what I want, which is the values I want, no NAs. But they are values, when I want another dataframe so I can write it back to an xlxs. As I have it, I lose the association to the organism.

Ex: Before

All samples by associated organisms

Ex: After

Single sample, no NAs, but also no association to organism index

Essentially the following image, but broken into individual samples. With only the organisms that met my threshold of 5 for counts, 0.05 for percents.

enter image description here

  • (1) Hi there and welcome to SO. Please make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) or [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) with a sample input and your expected output. (2) Regarding your question: You could take a look at the `dplyr` and `tidyr` packages, maybe combined with the `purrr` package, which is little bit more advanced. – Martin Gal Dec 21 '22 at 23:30
  • Please provide enough code so others can better understand or reproduce the problem. – Community Dec 22 '22 at 07:58

0 Answers0