I would like to take the unique rows of a data frame and then join it with another row of attributes. I'd then like to be able to count up the number of varieties, e.g. the number of unique fruits of a particular type or origin.
The first data frame has my list of fruits:
fruits <- read.table(header=TRUE, text="shop fruit
1 apple
2 orange
3 apple
4 pear
2 banana
1 banana
1 orange
3 banana")
The second data frame has my attributes:
fruit_class <- read.table(header=TRUE, text="fruit type origin
apple pome asia
banana berry asia
orange citrus asia
pear pome newguinea")
Here's my clumsy solution to the problem:
fruit <- as.data.frame(unique(fruit[,2])) #get a list of unique fruits
colnames(fruit)[1] <- "fruit" #this won't rename the column and I don't know why...
fruit_summary <- join(fruits, fruit_class, by="fruit" #create a data frame that I can query
count(fruit_summary, "origin") #for eg, summarise the number of fruits of each origin
So my main question is: how can this be expressed more elegantly (i.e. a single line rather than 3)? Secondarily: why won't it allow me to rename the column?
Thanks in advance