0

I have a repetitive task of calculating the average price of a product for each country. Price and country code (e.g., ES = Spain , TR = Turkey) are located in two different columns in my dataframe. How can I use a for-loop to iterate over the different countries?

# get price for ES only
ES = subset(training.data.raw$priceusd, training.data.raw$destinationcountry== "ES")
# sum all prices of ES
summyES = sum(ES)
# Freq of ES
FES = 5223
# avg price of ES
(avgES = summy/FES)

# AVG price for TR
TR = subset(training.data.raw$priceusd, training.data.raw$destinationcountry=="TR")
summyTR = sum(TR)
FTR = 3201
avgTR = summy/FTR
print(avgTR)
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • I have already checked for loop and other commands but i simply can't apply it here , or maybe i am applying it wrong , that's why i had to post this question here –  Jan 17 '16 at 18:50
  • @michael Gruenstaeudl , thanks ! –  Jan 17 '16 at 18:54

2 Answers2

1

You have a split-apply-combine problem. Try something like:

aggregate(priceusd ~ destinationcountry, data = training.data.raw, FUN = mean) 

As an example, from reproducible data:

> aggregate(Sepal.Length ~ Species, data = iris, FUN = mean)
     Species Sepal.Length
1     setosa        5.006
2 versicolor        5.936
3  virginica        6.588

There are dozens of ways to do this, using base R functions as well as add-on packages. Searching "split-apply-combine" should lead you to all of them.

Thomas
  • 43,637
  • 12
  • 109
  • 140
  • perfect ! exactly what i am looking for! thanks a lot I would like to ask you a question , so basically i am saying "get price of each destination country " when i do (priceusd ~ destinationcountry) ? –  Jan 17 '16 at 18:59
  • I would like also to ask you , when i run the code you wrote , it works perfectly and shows 60 results " 60 countries " , but i would like to get only 10 results out of these 60 , is that possible as well ? –  Jan 17 '16 at 19:05
  • @FadiGilbertChar Subset your data to those countries: `aggregate(priceusd ~ destinationcountry, data = training.data.raw[training.data.raw$destinationcountry %in% c("ES", "TR"), ], FUN = mean)` – Thomas Jan 17 '16 at 19:22
  • Again, thanks a lot , appreciate your hard work ! Thumb up ( sorry i have no enough reputation to vote your answer up ) –  Jan 17 '16 at 19:25
0

You can use dplyr to do this.

library(dplyr)

training.data.raw                %>%
    group_by(destinationcountry) %>%
    summary(avg = mean(priceusd))     # Avg computed for each group in destinationcountry

This will calculate the average for each group.

steveb
  • 5,382
  • 2
  • 27
  • 36
  • I have tried it but it says Error: could not find function "%>%" –  Jan 17 '16 at 18:59
  • I was missing the `library(dplyr)` statement, I just added it. If that doesn't fix it then you will likely either have to update `dplyr` or install and use the `magrittr` package. – steveb Jan 17 '16 at 19:01
  • Running perfect now ,but it's not exactly what im looking at , the first answer well do perfectly what i am looking for , but thanks a million for trying :) –  Jan 17 '16 at 19:03