Welcome to SO!
Here is one of several possibilities using R:
df <- data.frame(
hadm_id = c(100001, 100003, 100003, 100006, 100006, 100007, 100007,
100009, 100009, 100010, 100010, 100011, 100011),
rass_v = c(0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
)
# Edit: for better readability please use @Moody_Mudskipper's answer:
# df <- setNames(aggregate(df$rass_v, by = list(df$hadm_id), max), names(df))
df <- aggregate(rass_v~hadm_id, df, max)
print(df)
See this for more.
Here is a faster data.table solution (for bigger tables):
library(data.table)
DT <- data.table(
hadm_id = c(100001, 100003, 100003, 100006, 100006, 100007, 100007,
100009, 100009, 100010, 100010, 100011, 100011),
rass_v = c(0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
)
DT <- DT[DT[, .I[which.max(rass_v)], by=hadm_id]$V1]
print(DT)
Please see this related question and Arun's answer.
Result:
hadm_id rass_v
1: 100001 0
2: 100003 1
3: 100006 1
4: 100007 1
5: 100009 1
6: 100010 1
7: 100011 1
Edit: Here is the equivalent pandas way:
import pandas as pd
df = pd.DataFrame({'hadmid': [100001, 100003, 100003, 100006, 100006, 100007, 100007,
100009, 100009, 100010, 100010, 100011, 100011],
'rass_v': [0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]})
df = df.groupby(['hadmid'], sort=False)['rass_v'].max()
print(df)