0

I am doing some analysis on a 2 datasets that are split into different countries (same for both datasets just different numbers) but for 3 of the countries there is missing data. I am using the aggregate() function to fill in dummy values so that I can do my analysis without NAs popping up. However for some reason the function won't work when merging the new values back into the original data.

But if I clear my workspace and run it again it might work but only for 1 or 2 of the countries, or for 1 of the 2 datasets. I can't understand why it may work one time but not another, when I'm not changing the code any time. Any help would be greatly appreciated.

mil<-read.csv("C:/Data_millions.csv",header=TRUE)
per<-read.csv("C:/Data_percent.csv",header=TRUE)

##Fill in blanks for ZA
#Create dummy numbers for each category of age/age-gender
aggregate(data=mil,ZA~TypeOfPerson,mean,na.rm=TRUE)
#Merge output back into original data
ave_ZA<-ave(mil$ZA,mil$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
mil$ZA<-ifelse(is.na(mil$ZA),ave_ZA,mil$ZA)

aggregate(data=per,ZA~TypeOfPerson,mean,na.rm=TRUE)
ave_ZA_per<-ave(per$ZA,per$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
per$ZA<-ifelse(is.na(per$ZA),ave_ZA_per,per$ZA)

##Fill in blanks for BEWA
aggregate(data=mil,BEWA~TypeOfPerson,mean,na.rm=TRUE)
ave_BEWA<-ave(mil$BEWA,mil$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
mil$BEWA<-ifelse(is.na(mil$BEWA),ave_BEWA,mil$BEWA)

aggregate(data=per,BEWA~TypeOfPerson,mean,na.rm=TRUE)
ave_BEWA_per<-ave(per$BEWA,per$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
per$BEWA<-ifelse(is.na(per$BEWA),ave_ZA_per,per$BEWA)

##Fill in blanks for GR
aggregate(data=mil,GR~TypeOfPerson,mean,na.rm=TRUE)
ave_GR<-ave(mil$GR,mil$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
mil$GR<-ifelse(is.na(mil$GR),ave_GR,mil$GR)

aggregate(data=per,GR~TypeOfPerson,mean,na.rm=TRUE)
ave_GR_per<-ave(per$GR,per$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
per$GR<-ifelse(is.na(per$GR),ave_GR_per,per$GR)

Update: some example data and where it has not worked

Here is where there are still NAs: https://www.dropbox.com/s/bd9c9mjttdehbrt/missing.jpg?dl=0

Here is a link to my data: https://www.dropbox.com/s/vsiq9nr6ic3odmv/Data_millions.csv?dl=0

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
K-8
  • 99
  • 1
  • 3
  • It would be better if you show some small example data that shows the problem and your expected result. – akrun Dec 02 '14 at 11:21
  • I've attached some pictures of my datasets where it has worked and where it has not – K-8 Dec 02 '14 at 11:40
  • 1
    You need to provide the *actual data* (before aggregation) so we could test your code. You don't expect anyone here to copy/paste you data manually from your pictures don't you? See [**here**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – David Arenburg Dec 02 '14 at 11:41
  • @K-8 If somebody wants to test your code, they need to create an example dataset using their time. The pictures won't help much. Please use `dput` or even few lines of your data. – akrun Dec 02 '14 at 11:42
  • Sorry guys, I have attached it there now – K-8 Dec 02 '14 at 11:52

0 Answers0