Find the total turnaround time of n processes in O(n) time

Question

This is an algorithmic problem I came across during a coding interview, and I couldn't find a solution faster than an O(n^2) one.

The original problem seems to be a little lengthy, so I rephrase the problem into a round-robin process scheduling one one, which hopefully is more comprehensible.

Given n processes which are all initially in ready state, and an array T of size n, where T[i] is the time to run process i. We run the process in the order from process 0 to process n-1, and each time we run the selected process exactly 1 unit of time. If a process is finished, we skip it. We continue running the processes until all processes are finished.

Please give an efficient algorithm that outputs the total turnaround time of all processes. (The turnaround time of a process is the time it takes from being ready to being finished.)

Constraints:

1 <= n <= 10**5
1 <= T[i] <= 10**4

For example, T = [3, 1, 2], we would have T = [2, 1, 2] (run process[0]), then T = [2, 0, 2] (run process[1]), then T = [2, 0 , 1] (run process[2]) and so on.

So It takes 6 unit of times for process[0] to complete, and 2 unit of time for process[1] to complete, and 5 unit of time for process[2] to complete. And we should output 6+2+5=13.

My idea is that, for each process[i] we have to wait

process[0] to process[i-1]
T[i]
process[i+1] to process[n-1]

For each process[j] in 1., process[i] has to wait min(T[j], T[i]) unit of time. For each process[j] in 3., process[i] has to wait min(T[i]-1, T[j]) unit of time.

So each process[i] has turnaround time T[i] + sum_(j=0~(i-1))(min(T[j], T[i])) + sum_(j=(i+1)~(n-1)(min(T[j], T[i]-1)), and the answer is just summing up all turnaround time of process[i].

As we can see from the formula, it takes O(n) time to compute each turnaround time, and thus total O(n^2) time to solve the problem.

I was wondering if there is any way to optimize the solution to O(n) time. (Maybe by prefix sum or dynamic programming, but i am not sure.)

Thanks for the comments!

Is each process time dynamic/arbitrary, or is it one unit of time? — ryanwebjackson, Apr 27 '23 at 19:33
@user3386109 Thanks for your suggestions, the original problem description is about making the items (each item `i` takes `T[i]` time). The question didn't mention round robin, but it just takes one item and "make" it `1` unit of time. If the items is completed, give it away, else throw it to the back of the queue. So I think it's the same as round robin. — KCH, Apr 28 '23 at 06:47
@ryanwebjackson The process time is given first, for example, `T = [3, 2, 4]`, then `process[0]` needs `3` unit of time to run and `process[1]` needs `2` units of time to run. Also, we have only one CPU, so other processes must wait the running process. Hope it's more clear for you. — KCH, Apr 28 '23 at 06:50

score 1 · Answer 1 · answered Apr 27 '23 at 21:16

The "total turnaround time" is the sum of the finish times for each task.

Determining the finish time for each task can be divided into two parts:

Determine the order in which the tasks will finish. You can determine this by sorting task by time, breaking ties by position;
Visit the tasks in completion order, and determine how much time there is between each pair of adjacent completions. Then keep a running total.

The set of tasks that are finished does not change between completion events. The positions of the preceding and following tasks divide the array into 3 sections, and you can simply calculate the time between completions if you know how many unfinished tasks there are in each of these sections.

To figure out how many unfinished tasks there are in each section, you can track unfinished tasks in an order statistic tree or similar -- it starts full and you remove the completed task at each iteration. I wrote about a simple way to do that a few days ago in this answer: C++ fastest way for calculating rolling median

The time that this algorithm requires is dominated by initial sort, and by maintaining the order statistic tree as you remove items. Both of these take O(N log N) time in total.

I had thought about computing the answer in the order of task completion too, but I find it really hard for me to implement. Besides, the method didn't seem to use the constraint `T[I] <= 10^4`, so I guess there is another way to do it. Thank you. — KCH, Apr 28 '23 at 06:42

score 0 · Accepted Answer · answered Apr 28 '23 at 03:20

The sorting first solution suggested by @Matt Timmermans makes the implementation more complicated if correct. A task's finish time actually is dependent on the relative order of t[i] and sorting the input will break the original order.

The key observations here for a given task t[i] are:

for j < i, all t[j] that are <= t[i] will contribute to the finish time of t[i]. for all t[j] > t[i], they only contribute t[i] to the finish time of t[i].
for j > i, all t[j] that are <= t[i] - 1 will contribute to the finish time of t[i]. for all t[j] > t[i] - 1, they only contribute t[i] - 1 to the finish time of t[i].

Now the original problem is converted to the following subproblems.

Given the constraint of t[i] <= 10^4, and n <= 10^5, we do not need to worry about integer overflow, so using int is enough. Fenwick tree (binary indexed tree) is also a good dynamic data structure to solve all of the above subproblems.

For a given t[i], what are the sum of numbers that are <= t[i] in index range[0, i - 1]. prefixSumFenwickTree supports O(log(maxT)) range sum queries for this subproblem.
For a given t[i], what are the count of numbers that are > t[i] in index range[0, i - 1].prefixCntFenwickTree supports O(log(maxT)) range sum queries for this subproblem.
For a given t[i], what are the sum of numbers that are <= t[i] - 1 in range [i + 1, n - 1]. suffixSumFenwickTree supports O(log(maxT)) range sum queries for this subproblem.
For a given t[i], what are the count of numbers that are > t[i] - 1 in range [i + 1, n - 1].suffixCntFenwickTree supports O(log(maxT)) range sum queries for this subproblem.

If you are not familiar with Fenwick tree, you can look it up online, there are plenty of resources introducing this data structure.

Conceptually, 2 different types of Fenwick tree are used here. One is to support the count of numbers that are in value range [LOW, HIGH; The other one is to support the sum of numbers that are in value range [LOW, HIGH]. Hence we do different updates, one is +/- 1, the other is +/- t[i].

The overall runtime is O(N * log(maxT)).

class Solution {
    public int computeTotalTurnAroundTime(int[] t) {
        int maxT = 0;
        for(int x : t) {
            maxT = Math.max(maxT, x);
        }
        int totalTurnAroundTime = 0;
        FenwickTree prefixSumFenwickTree = new FenwickTree(maxT + 1);
        FenwickTree suffixSumFenwickTree = new FenwickTree(maxT + 1);
        FenwickTree prefixCntFenwickTree = new FenwickTree(maxT + 1);
        FenwickTree suffixCntFenwickTree = new FenwickTree(maxT + 1);
        for(int x : t) {
            suffixSumFenwickTree.update(x, x);
            suffixCntFenwickTree.update(x, 1);
        }
        for(int x : t) {
            suffixSumFenwickTree.update(x, -x);
            suffixCntFenwickTree.update(x, -1);
            int turnAround = x;
            turnAround += prefixSumFenwickTree.rangeSum(x);
            turnAround += prefixCntFenwickTree.rangeSum(x + 1, maxT) * x;
            turnAround += suffixSumFenwickTree.rangeSum(x - 1);
            turnAround += suffixCntFenwickTree.rangeSum(x, maxT) * (x - 1);
            totalTurnAroundTime += turnAround;
            prefixSumFenwickTree.update(x, x);
            prefixCntFenwickTree.update(x, 1);
        }
        return totalTurnAroundTime;
    }

    public static void main(String[] args) {
        Solution solution = new Solution();
        int[] t = new int[]{3,1,2};
        System.out.println(solution.computeTotalTurnAroundTime(t));
    }
}
class FenwickTree {
    int[] ft;
    FenwickTree(int n) {
        ft = new int[n];
    }
    int rangeSum(int r) {
        if(r == 0) return 0;
        int sum = 0;
        for(; r > 0; r -= (r & (-r))) {
            sum += ft[r];
        }
        return sum;
    }
    int rangeSum(int l, int r) {
        if(l > r || r == 0) return 0;
        return rangeSum(r) - (l > 1 ? rangeSum(l - 1) : 0);
    }
    void update(int idx, int diff) {
        for(; idx < ft.length; idx += (idx & (-idx))) {
            ft[idx] += diff;
        }
    }
}

Didn't think about Fenwick tree. I will try it later, thank you. — KCH, Apr 28 '23 at 06:54

Find the total turnaround time of n processes in O(n) time

2 Answers2