0

I'm trying to merge rows of a data set by using the mean operator. Basically, I want to convert data set 1 into data set 2 (see below)

1. ID  MEASUREMENT      2. ID   MEASURE
   A       20              A     22.5
   B       30              B     30
   A       25              . 
   .                       .
   .                       .  

How can I do this on R?

Note that in contrast to the example I have given here, my data set is really large and I can't look through the data set, group rows according to their id's then find colMeans.

My thoughts are to order the dataset, separate the measures for each id, then find each mean and regroup the data. However, this will be very time consuming. I would really appreciate if someone can assist me with a direct code or even a for loop.

Kristofersen
  • 2,736
  • 1
  • 15
  • 31
D Bhagwandin
  • 3
  • 1
  • 1
  • 3

2 Answers2

2

This code should be able to do that for you.

library(data.table)
setDT(dat)
dat = dat[ , .(MEASURE = mean(MEASUREMENT)), by = .(ID)]

Just to be a little more complete i'll throw in an example and a way to do this in base R.

Data:

dat = data.frame(ID = c("A","A","A","B","B","C"), MEASUREMENT = c(1:3,61,13,7))

With only base R functions:

aggregate(MEASUREMENT ~ ID, FUN = mean, dat)
  ID MEASUREMENT
1  A           2
2  B          37
3  C           7

With data.table:

library(data.table)
setDT(dat)
dat = dat[ , .(MEASURE = mean(MEASUREMENT)), by = .(ID)]
> dat
   ID MEASURE
1:  A       2
2:  B      37
3:  C       7
Kristofersen
  • 2,736
  • 1
  • 15
  • 31
2

You can also do this easily in dplyr, assuming your data is in df

library(dplyr)

df <- df %>%
  group_by(ID) %>%
  summarize(MEASURE = mean(MEASUREMENT))
MeetMrMet
  • 1,349
  • 8
  • 14