Algorithmic Puzzle: Distinct nodes in a subtree

Question

I am trying to solve this question:

You are given a rooted tree consisting of n nodes. The nodes are numbered 1,2,…,n, and node 1 is the root. Each node has a color.

Your task is to determine for each node the number of distinct colors in the subtree of the node.

The brute force solution is to store a set for each node and them cumulatively merge them in a depth first search. That would run in n^2, not very efficient.

How do I solve this (and the same class of problems) efficiently?

I don't see how it is `O(n^2)`. It the tree is more or less balanced, we have `T(n) = 2T(n/2) + O(n)` (merging is `O(n)`), which means `T(n) = O(n log n)`. — user58697, May 17 '20 at 23:04
The tree might not be balanced at all. We do not know anything about it. That is correct in balanced trees, but it's easy to create counterexample that will have complexity way bigger than O(n log n) — Maras, May 17 '20 at 23:22
To be more specific, make one path of length n / 2, for each node on the path add additional child (leaf) - total n nodes. Merging in that one will take O(n^2) time. — Maras, May 17 '20 at 23:34
Copy-pasted from https://cses.fi/problemset/task/1139, without attribution. We require you to attribute the source of all copied material: https://stackoverflow.com/help/referencing — D.W., May 23 '20 at 03:24

score 1 · Answer 1 · answered May 17 '20 at 23:45

First of all, you'd want to change tree into a list. This technique is often called 'Euler Tour'.

Basically you make an empty list and run DFS. If you visit a node first or last time, push it's color at the end of the list. In this way you'll get list of length 2 * n, where n is equal to number of nodes. It's easy to see that in the list, all colors corresponding to node's children are between its first and last occurrence. Now instead of tree and queries 'how many different colors are there in node's subtree' you have list and queries 'how many different colors are there between index i-th and j-th'. That actually makes things a lot easier.

First idea -- MO's technique O(n sqrt(n)):

I will describe it briefly, I strongly recommend searching up MO's technique, it is well explained in many sources.

Sort all your queries (remainder, they look like this: given pair (i, j) find all distinct numbers in sub-array from index i to index j) by their start. Make sqrt(n) buckets, place query starting from index i to bucket number i / sqrt(n).

For each bucket we will answer the queries separately. Sort all queries in the bucket by their end. Now start processing the first one (the query which end is most to the left) using brute force (iterate over the subarray, store numbers in set/hashset/map/whatever, get size of the set).

Now to process the next one, we shall add some numbers at the end (next query ends farther than the previous one!) and, unfortunately, do something about its start. We'd need to either delete some numbers from the set (if the next query's start > old query start) or add some numbers from the beginning (if the next query's start < old query start). However, we may do it using brute force too, since all queries have start in the same segment of sqrt(n) indices! In total we get O(n sqrt(n)) time complexity.

Second idea -- check this out, O(n log n): Is it possible to query number of distinct integers in a range in O(lg N)?

displayName · Accepted Answer · 2020-05-18T00:34:37.890

1

For each node,

Recursively traverse the left and right nodes.
Have each call return a HashSet of color.
At each node, merge the left child set, the right child set .
Update the count for the current node in a HashMap.
Add the color of current node and return the set.

Sample C# code:

public Dictionary<Node, Integer> distinctColorCount = new ...

public HashSet<Color> GetUniqueColorsTill (TreeNode t) {
    // If null node, return empty set.
    if (t == null) return new HashSet<Color>();

    // If we reached here, we are at a non-null node.
    // First get the set from its left child.
    var lSet = GetUniqueColorsTill(t.Left);

    // Second get the set from its right child.
    var rSet = GetUniqueColorsTill(t.Right);

    // Now, merge the two sets.
    // Can be a little clever here. Merge smaller set to bigger set.
    var returnSet = rSet;
    returnSet.AddAll(lSet);

    // Put the count for this node in the dictionary.
    distinctColorCount[t] = returnSet.Count;    

    // Finally, add the color of current node and return.
    returnSet.Add(t.Color);

    return returnSet;
}

You can figure out the complexity exactly as @user58697 commented on your question using the Master Theorem. This is another answer from me written long time ago that explains Master Theorem, if you need a refresher.

c#

edited May 18 '20 at 00:34

answered May 18 '20 at 00:27

displayName

13,888
8
60
75

This runs in `n^2` if the tree isn't balanced. – nz_21 May 18 '20 at 00:37
@Maras: Using lists there are 2 problems. 1 - Keeping them sorted to merge them in the "merge sort" fashion. 2 - Removing the duplicates. Using a hashset solves those problems for you. – displayName May 18 '20 at 01:20
1

@nz_21: Every node will need to be visited once. If the tree is skewed (let's assume the no tree node has a right child), then `rSet` in the answer above will be empty and you will just have a `lSet` to which you will add 1 color at each node. Effectively, it will be O(n), not O(n^2) – displayName May 18 '20 at 01:24
Right, I did not get that you're not copying the larger hash set. Indeed, this way each value will be inserted up to log(n) times. The total complexity is indeed expected O(n log n). You do have to choose the biggest set among your children however. If you do as you explained above your pseudocode (merge left to right), you do have O(n^2) worst case. – Maras May 18 '20 at 12:19
Also, it's not necessarily a binary tree. Given this, I'm not sure if the asymptotic analysis still holds up – nz_21 May 18 '20 at 12:59
@nz_21: My mistake I didn't notice that it isn't a binary tree. The analysis is still true though. Most likely the complexity will decrease because a tree with more than just two children will have a smaller height than a tree with at most two children. However, effectively you will populate the sets with same values, which is O(n) for n nodes. – displayName May 18 '20 at 13:05

Algorithmic Puzzle: Distinct nodes in a subtree

2 Answers2