For an Android Studio project written in Java, I've got a List of daytimes which collects hours and minutes as integers like this:
List<Integer> times = new ArrayList<>();
int hour = 16;
int minute = 25;
int time = hour * 60 + minute;
times.add(time);
I need the mean and the standard deviation of times in order to achieve a list of non-outlier times. However, the ordinary mean and standard deviation don't seem to work. Here is what I'm doing right now:
private List<String> getNonOutlierTimes() {
int mean = convertToTime((times.stream().mapToInt(Integer::intValue).sum()) / times.size());
int sd = (int) calculateStandardDeviation(mean);
int maxTime = (int) (mean + 1.5 * sd);
int minTime = (int) (mean - 1.5 * sd);
List<Integer> nonOutliers = new ArrayList<>();
for (int i = 0; i < times.size(); i++) {
if ((times.get(i) <= maxTime) && (times.get(i) >= minTime)) {
nonOutliers.add(times.get(i));
}
}
List<String> nonOutliersStr = new ArrayList<>();
for (Integer nonOutlier : nonOutliers) {
nonOutliersStr.add(convertIntTimesToStr(nonOutlier));
}
return nonOutliersStr;
}
private int convertToTime(int a) {
if ((a < 24*60) && (a >= 0)) {
return a;
} else if (a < 0) {
return 24*60 + a;
} else {
return a % (24*60);
}
}
private double calculateStandardDeviation(int mean) {
int sum = 0;
for (int j = 0; j < times.size(); j++) {
int time = convertToTime(times.get(j));
sum = sum + ((time - mean) * (time - mean));
}
double squaredDiffMean = (double) (sum) / (times.size());
return (Math.sqrt(squaredDiffMean));
}
private String convertIntTimesToStr(int time) {
String hour = (time / 60) + "";
int minute = time % 60;
String minuteStr = minute < 10 ? "0" + minute : "" + minute;
return hour + ":" + minuteStr;
}
Although all calculations are based on valid statistics, the calculated mean and sd seem irrelevant. For example when the times list contains the following:
225 (03:45 am), 90 (01:30 am), 0 (12:00 am), 1420 (11:40 pm), 730 (12:10 pm)
I need a non-outliers list containing:
1420 (11:40 pm), 0 (12:00 am), 90 (01:30 am), 225 (03:45 am)
where the actual output is:
0 (12:00 am), 90 (01:30 am), 225 (03:45 pm), 730 (12:10 pm)
i.e., I need the mean to be where most of the times are. To be more specific, consider a list of times containing integers 1380 (23:00 or 11:00 pm), 1400 (23:20 or 11:20 pm), and 60 (01:00 am). The mean for these times is 945 (15:45 or 03:45 pm) where I need the mean to lie between 23:00 and 01:00.
I have already found this solution for a list of two times. However, my times.size() is always greater than 2 and I'd also like to calculate the standard deviation, as well. So, I appreciate your help in this regard.
Thanks in advance.