So, I'm trying to make a bar graph, but am having trouble with getting the frequencies for each variable. In my csv file, I have a column (called "Clade") where each cell has a variable (note that each variable can appear in more than one cell). There is another column called "Total" where each cell is a numerical value corresponding to the cell in the first column (both cells are in the same row). What I am trying to do is to calculate the frequency of each variable in the "Clade" column WHILE taking into account the numerical value in the "Total" column. For example, one variable in "Clade" appears 3 times, but the numerical value associated with one of the three is 23. Any help is greatly appreciated!
Asked
Active
Viewed 793 times
0
-
2Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Sep 13 '16 at 21:03
-
1For inspiration, see: [*Is there an aggregate FUN option to count occurrences?*](http://stackoverflow.com/questions/9809166/is-there-an-aggregate-fun-option-to-count-occurrences) – Jaap Sep 13 '16 at 21:07
1 Answers
0
If I get your question right, you want to see the frequency of a value within a variable, being this variable the column Clade
and each value of such variable in a cell.
I'll do a reproducible example, so you can tune it to your specific needs:
library(dplyr)
set.seed(1)
values <- c('one', 'two', 'three', 'four', 'five')
df <- data.frame(clade =sample(values, size = 1000, replace = TRUE),
total = rnorm(1000, mean = 0, sd = 1))
# find create a column with the frequency values for each clade variable value
df <- df %>%
group_by(clade) %>%
mutate(freq = n()/nrow(.))
# plot the frequencies
barplot(prop.table(table(df$clade)))
What this code do is first to replicate the data you're referring to, second I create a variable in your data frame called freq
which shows you the relative frequency of the Clade
variable value within the data. Finally I plot the relative frequency of the values of the Clade
variable.

toku_mo
- 78
- 4
- 11