0

I have imported an excel spreadsheet into R studio and I need to write R commands for the data. I need a command to display how many times an item has been sold. The data looks a little something like this

PRODUCT ------------------- UNITS                      

eye liner ----------------------- 10 

lip gloss ----------------------- 5

eye liner ----------------------- 10

lip gloss ----------------------- 5

I do not know how to count how many units of lip gloss have been sold. The best I can do is display how many times lip gloss shows up in the data with the command:

nrow(mySales[mySales$Product=="lip gloss",])

This command doesn't count how many units of lip gloss are sold which is 10, it only counts how many times lip gloss appears in the data (2). This is a beginner course and this is the first exercise, I am assuming it is a simple problem however I am completely lost.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
John Doe
  • 17
  • 1
  • 1
  • 5
  • Did you check out the command `table`? – Stedy Feb 15 '15 at 04:59
  • I have used variations of the table command but I still haven't gotten the correct answer. I know how to use the table command with only 1 column at a time, I don't know how to use it in a way to get data from both columns. I'm messing with it as we speak – John Doe Feb 15 '15 at 05:03
  • possible duplicate of [How to group columns by sum in R](http://stackoverflow.com/questions/1660124/how-to-group-columns-by-sum-in-r) – Burhan Khalid Feb 15 '15 at 05:21

2 Answers2

1

You are almost there. If you look at your code :

nrow(mySales[mySales$Product=="lip gloss",])

this line here :

mySales[mySales$Product=="lip gloss",]

will subset the data that has the product called lip gloss

When you add nrow you are counting the number of rows in the new subset data

Hence you can get the total count by using the function row Hence what you need to do next can replace nrow with rowSum, or sum if you subset the units columns of the new dataframe

sum(mySales[mySales$Product=="lip gloss",]$UNITS)

Heres a step by step version

lipGlossSales<- mySales[mySales$Product=="lip gloss",] lipGlossUnits <-lipGlossSales$UNITS totallipGloss <- sum(lipGlossUnits) Happy R-ing

cheers,

biobirdman
  • 4,060
  • 1
  • 17
  • 15
  • This solved my problem, thanks a lot. I don't understand it completely but it is a start, thanks again. – John Doe Feb 15 '15 at 05:25
  • Thanks john, just do it step by step. Do `mySales[mySales$Product=="lip gloss",]` and you will see that it produce a smaller dataframe. Then you apply the function `sum` over the column `UNIT`. You can select a column by using `$ ` – biobirdman Feb 15 '15 at 05:27
0

This is called the split-apply-combine approach and is well-documented and very common in data analysis. In this case I would try the plyr library which allows for making a nice summary of the data as such:

fakedata <- data.frame(Product=c('eye liner', 'lip gloss', 'eye liner', 'lip gloss'),
                       count=c(10,5,10,5))

library(plyr)
product.counts <- ddply(fakedata, "Product", function(x) data.frame(Productcount = sum(x$count)))
R> product.counts
    Product Productcount
1 eye liner           20
2 lip gloss           10
Stedy
  • 7,359
  • 14
  • 57
  • 77
  • Thanks a lot for the help. This solves my problem but we haven't learned how to do this in my class yet, so I don't think I'm allowed to solve the problem using this approach. Is there any other way to do this? – John Doe Feb 15 '15 at 05:18
  • you could write a `for` loop that subsets by each unique value in Product and sums that subset. – Stedy Feb 15 '15 at 05:20