I am fairly new to R and am trying to translate some data from a dataframe called df
into a data table. My dataframe looks as such:
preds ground_truth group
1 0.0008786491 0 1
2 0.0009080505 1 1
3 0.0009118593 0 1
4 0.0009121987 1 2
6 0.0009514780 0 2
7 0.0009572834 1 3
8 0.0009645682 0 4
9 0.0009721006 1 4
10 0.0009761475 0 5
11 0.0009835458 0 5
There are several pieces of information I wish to be extracted from this, most of which I have managed successfully.
For each unique group, I want the average value for preds
, I want the average value for ground_truth
, the count of preds
in each unique group and finally the range of preds
.
I have managed to get all of these but the problem lies in the range making 2 rows for each group for the min and max instead of being on a single line in any format.
I have tried using lists, c()
, as.character()
but nothing has worked.
The output looks like this with the first row number being the min and second row being the max:
Group_number range N predicted_mean actual_mean
1: 1 0.479342132806778 6492 0.55383 0.715
2: 1 0.855185627937317 6492 0.55383 0.715
3: 2 0.407937824726105 6492 0.44054 0.532
4: 2 0.479312479496002 6492 0.44054 0.532
I wanted the column range to contain any format that will allow both the values in a single row:
Group_number range N predicted_mean actual_mean
1: 1 (0.479342132806778, 0.855185627937317) 6492 0.55383 0.715
My solution so far has been this:
group_results <- data.table(Group_number = numeric(), range=numeric(), N=numeric(),
predicted_mean=numeric(), actual_mean=numeric())
for (i in unique(df$group)){
pred <- df$preds[df['group'] == i]
actual <- df$ground_truth[df['group'] == i]
predicted_mean <- sum(pred)/length(pred)
actual_mean <- sum(actual)/length(actual)
range <- c(min(pred), max(pred))
N <- length(pred)
group_results <- rbind(group_results, list(i, range, N, round(predicted_mean, 5),
round(actual_mean, 3)))
}
Can someone please tell me how I would fix range to be on a single line in data.table
.
Thanks