I'm working with data transformation in R and one that I haven't been able is to filter lines with similar values selecting the one with higher "Expression value" and later on split data in columns by Expression level and aggregate them. Since I know the explanation won't give the Nobel prize, below is the original data, the result and what I've achieved so far.
ORIGINAL DATA
df <- read.table(text =
"Tissue Species Expression
1 dentritic Human moderate
2 liver Human high
3 liver Human moderate
4 liver Human moderate
5 liver Human high
6 liver Monkey high
7 liver Monkey moderate
8 liver Dog high
9 liver Dog high
10 liver Minipig moderate
11 liver Rat low
12 liver Rat cutoff
13 liver Monkey moderate
14 lung Monkey high
15 quadriceps Monkey cutoff" , header = TRUE)
The result that I need to achieve is, in case of both values of Tissue and Species repeated, select only the highest value on Expression.
Tissue High_Expression Moderate_Expression Low_Expression cutoff
1 dentritic Human
2 liver Human, Monkey,Dog Minipig Rat
3 lung Monkey
4 quadriceps Monkey
What I have so far:
df$Expression <- factor(df$Expression, levels = c("cutoff", "low", "moderate", "high"), ordered = TRUE)
df$Species <- as.character(df$Species)
df <- df %>%
mutate(High_expressed = ifelse(Expression == "high", Species, "")) %>%
mutate(moderate_expressed = ifelse(Expression == "moderate", Species, "")) %>%
mutate(low_expressed = ifelse(Expression == "low", Species, "")) %>%
mutate(below_cutoff_expressed = ifelse(Expression == "cutoff", Species, "")) %>%
select(-c("Expression", "Species"))
df <- aggregate(. ~ groupTissue, data = df, paste, collapse = ",")
That gives:
Tissue High_Expression Moderate_Expression Low_Expression cutoff
1 dentritic Human
2 liver Human,,,Human, ,Human,Human,,, ,,,,,,,,,Rat,, ,,,,,,,,,Rat,
Monkey,,Dog,Dog,,,, Monkey,,,Minipig,,,Monkey
3 lung Monkey
4 quadriceps Monkey
Thanks in advance