0

I have a sample survey sheet; something like demographic. One of the columns is country (factor) another is annual income. Now, I need to calculate average of each country and store in new data.frame with country and corresponding mean. It should be simple but I am lost. The data is something like the one shown below:

Country  Income($) Education ... ... ...
1. USA    90000      Phd
2. UK     94000      Undergrad
3. USA    94000      Highschool
4. UK     87000      Phd
5. Russia 77000      Undergrad
6. Norway 60000      Masters
7. Korea  90000      Phd
8. USA    110000     Masters
.
.

I need a final result like:

USA   UK    Russia ...
98000 90000 75000

Thank You.

Arun
  • 116,683
  • 26
  • 284
  • 387
700resu
  • 259
  • 6
  • 16
  • downvote not from me but please [read this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and edit your post, as it stands this will likely be closed. – user1317221_G Feb 16 '13 at 19:16
  • @user1317221_G, does it look better, if that's what you mean. – 700resu Feb 16 '13 at 19:28
  • 4
    The answer to this question is in almost every R-tutorial i've seen. Take the time to go through one of them completely and you'll save yourself an immense amount of time in the long haul. – N8TRO Feb 16 '13 at 19:47
  • @NathanG is right. I would take some time to google & familiarise yourself especially with `ddply` and `aggregate` as there are a lot of great blogs, and these are often used tools. – user1317221_G Feb 16 '13 at 19:49
  • Ok thanks. Guys. I didn't know about ddply. – 700resu Feb 16 '13 at 19:53

1 Answers1

5

data example:

dat <- read.table(text="Country  Income Education 
 USA    90000      Phd
 UK     94000      Undergrad
 USA    94000      Highschool
 UK     87000      Phd
 Russia 77000      Undergrad
 Norway 60000      Masters
 Korea  90000      Phd
 USA    110000     Masters",header=TRUE)

Do what you want with plyr :

if your data is called dat:

library(plyr)
newdf <- ddply(dat, .(Country), function(x) Countrymean = mean(x$Income))

# newdf <- ddply(dat, .(Country), function(x) data.frame(Income = mean(x$Income)))

and aggregate:

 newdf <- aggregate(Income ~ Country, data = dat, FUN = mean)

for the output you show at the end maybe tapply?

tapply(dat$Income, dat$Country, mean)
user1317221_G
  • 15,087
  • 3
  • 52
  • 78
  • Thanks. I have a question though. I tried sorting now and used **newdf<-newdf[order(Income),]** But it does not seem to work . It says object "Income" not found. does newdf have different structure? I also tried changing **newdf<-newdf[,order(Income)]** though. – 700resu Feb 16 '13 at 20:29
  • I think your probably wanting to do something like this: `newdf[with(newdf, order(Income)), ]` check [this post](http://stackoverflow.com/a/1296745/1317221) also I added an extra `ddply` line of code in answer for you to help you get a `newdf` with the mean column called `Income` – user1317221_G Feb 16 '13 at 20:48