-1

I have dataframe as below

problem

Every HH repeats exactly 3 times in a column. I have millions of such total records

I want to roll up data for each HH value in one row and delete remaining two rows for that HH.

For column A - take mode for the values for each HH , put it in a row. For column B- take mode for the values for each HH , put it in a row. For column C - take average of the values for each HH , put it in a row. For column D - take mode for the values for each HH , put it in a row.

I need to do this for all HH values which are in thousands in my dataframe.

output:

enter image description here

Thanks

M--
  • 25,431
  • 8
  • 61
  • 93
Earthshaker
  • 549
  • 1
  • 7
  • 12
  • Check out the `dplyr` package. There is a `group_by` function in that package. Also you should create a reproducible example when posting a question. – akash87 Jun 16 '17 at 12:30
  • Thanks Akash . I will surely check dplyr. I have added screenshot of my problem and output now. That will clearly give idea about question. It was my questions so messed up a bit :) – Earthshaker Jun 16 '17 at 12:36
  • Welcome to SO. Please take a look [here](https://stackoverflow.com/a/5963610/7306168), to see how to create a reproducible example. You should avoid including your code or data as image, as that would meen people need to re-type it all. (also your links are not working) – Bea Jun 16 '17 at 15:10

1 Answers1

0

That Code is probably a little inefficient, but it works with your sample.

Problem<-matrix(c(123,0,2,16,0,123,1,3,15,1,123,1,2,14,0,456,0,2,12,1,456,0,2,18,0,456,0,2,24,0),nrow=6,ncol=5,byrow=TRUE)
colnames(Problem)<-c("HH","A","B","C","D")
Problem<-split(Problem,Problem[,1])

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
Result<-matrix(NA,nrow=length(Problem),ncol=5)
for (i in 1:length(Problem)){
Result[i,]<-c(sum(Problem[[i]][1:3])/3,Mode(Problem[[i]][4:6]),Mode(Problem[[i]][7:9]),mean(Problem[[i]][10:12]),Mode(Problem[[i]][13:15]))}
RBeginner
  • 244
  • 3
  • 7