0

I want to make a O(N) time complexity mapping of a BTreeMap with a closure that increases (or decreases) with respect to key order. (N is the number of elements) In other words, I'd like to do something like this:

/*
f is monotonic, i.e. is either increasing or decreasing with key order.
f is increasing with key order means:
   if k1 < k2, (fk1,fv1) = f(k1,v1) and (fk2,fv2) = f(k2,v2), 
   then fk1 < fk2
*/
fn monotonic_map<K, V, F>(tree: BTreeMap<K,V>, f: F) -> Result<BTreeMap<K,V>,...> 
        where F: FnMut(K,V) -> (K,V), ... { 
    // O(N) computation
    ...
    // keys/values of mappend_tree are (fk,fv) = f(k,v) where (k,v) is a key/value of tree
    if is_monotonic { Ok(mapped_tree) } else { Err(...) }
}

In theory this should be possible:

  • Iterate by reference over the keys/values of the tree and test that the closure is actually increasing or decreasing (must be O(N))
  • If test is OK, then map (and reverse in case of decreasing closure) the keys/values of the tree without changing the structure of the tree (must be O(N))

In practice, how to do this with a BTreeMap? or equivalent (e.g. crate)?


Explanation of O(...) notation:

https://en.wikipedia.org/wiki/Time_complexity#Table_of_common_time_complexities

Inserting or removing an element of BTreeSet will need some tree restructuration at cost O(log(N)). As a consequence, mapping a BTreeMap by means of iterator and collector will cost O(N * log(N)). This is necessary for non monotonic closure. But for monotonic closure, the tree structure remains the same, so that you only need O(N).

FreD
  • 393
  • 2
  • 12
  • can I have the human explanation of what you want ? – Stargateur Aug 31 '23 at 14:34
  • I am not sure of what you mean by 'human'. Let's try to say it differently. O(N) complexity means that the amount of needed computation is proportional to the number N of element of the tree. With BTreeMap and other similar structure, the restructuration of the tree resulting from suppression or insersion of a SINGLE element is of complexity O(log(N)), so that a mapping based on iterators and collection will be of complexity O(N log(N)). But for monotonic mapping, the order of keys does not change (or is reversed) by mapping, so that you don't need to restructure the tree. – FreD Aug 31 '23 at 14:54
  • Then you just need to clone the tree and map the keyvalues directly within the structure (or eventually reverse the order of the whole enties). This is O(N) instead of O(N log(N)) which means that you need an amount of computation proportional to N and not to N * log(N) – FreD Aug 31 '23 at 14:56
  • look at: https://en.wikipedia.org/wiki/Time_complexity#Table_of_common_time_complexities – FreD Aug 31 '23 at 15:03
  • IIUC you want to create a new `BTreeMap` that contains `f(k, v) for (k, v) in tree`, and you'd like to do it in O(N) instead of O(N×log(N)) by reproducing the structure of the original tree? – Jmb Aug 31 '23 at 15:10
  • @Jmb. Yes exactly! – FreD Aug 31 '23 at 15:11
  • AFAIK that's not possible with the std `BTreeMap`, and I don't know of any crate that allows it, so you'd probably need to roll your own. – Jmb Aug 31 '23 at 15:14
  • @Jmb thx way more clear haha – Stargateur Aug 31 '23 at 15:27
  • 1
    I believe the natural approach `map.into_iter().map(f).collect::>()` is O(N). First we iterate over the map and collect it into a `Vec`, which is O(N). Then we sort this vec, using stable sort, which is definitely O(N) with increasing order (already sorted array), and [experiments show](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=76ff4c1c13f7042beeeb6a56a660e47d) it is also O(N) on decreasing order (reverse-sorted array). Then we build the map from that (https://doc.rust-lang.org/stable/src/alloc/collections/btree/append.rs.html#38)... – Chayim Friedman Aug 31 '23 at 15:27
  • yes I think using a sorted vec should be optimized but sort a vec is not O(n) – Stargateur Aug 31 '23 at 15:28
  • 1
    ...and this is where I'm less sure (didn't examine the code in details), but I believe it is also O(N). This is not the _most efficient_ approach (allocating multiple times and copying and sorting instead of just inserting in-place), but it is still O(N). I don't believe something more optimized is possible with std. – Chayim Friedman Aug 31 '23 at 15:28
  • Also, none of that is _guaranteed_, so if you need a _guaranteed_ O(N) execution, you're out of luck. – Chayim Friedman Aug 31 '23 at 15:29
  • Thank you Chayim. I will experiment with your idea and another of mine, and if there is still no answer to my question, I will post an answer following this experimentation. – FreD Aug 31 '23 at 15:38

1 Answers1

0

I propose two answers to this question, and then fullfil some tests.


First answer comes from a remark of Chayim Friedman and is based on standard operations. It comes out that method collect() is based on method bulk_push(...) applied on sorted iterator and is linear time (that is O(N)) as shown in the source code:

...
        // Meanwhile, we build a tree from the sorted sequence in linear time.
        self.bulk_push(iter, length, alloc)
    }

    /// Pushes all key-value pairs to the end of the tree, incrementing a
    /// `length` variable along the way. The latter makes it easier for the
    /// caller to avoid a leak when the iterator panicks.
    pub fn bulk_push<I, A: Allocator + Clone>(&mut self, iter: I, length: &mut usize, alloc: A)
    where
        I: Iterator<Item = (K, V)>,
    {
...

The following answer can be proposed on the basis of this observation:

pub fn agnostic_map<K, V, F>(tree: &BTreeMap<K,V>, f: F) -> BTreeMap<K,V> 
                                            where F: FnMut((&K,&V)) -> (K,V), K: Ord {
    tree.iter().map(f).collect()
}

Second answer comes from my first idea to reuse the structure of the tree. It implements one unsafe transmute step but is accepted by Miri. Beware of the approach, however, and take into account the remarks of Chayim Friedman bellow on the unsoundness of the approach.

The raw idea is to use Cell<K> as keys instead of K in order to allow key modification. But the problem is that Cell<K> does not implement PartialEq, Eq, PartialOrd and Ord. In practice, we do not need these traits for changing keys/values of the tree without changing its structure, but they are necessary for type check.

For this reason, we define OrdCell<K> which implement fakely PartialEq, Eq, PartialOrd and Ord.

#[repr(transparent)]
pub struct OrdCell<T> {
    pub inner: Cell<T>,
}
impl<T> PartialEq for OrdCell<T> where T: PartialEq {
    fn eq(&self, _other: &Self) -> bool { panic!("should not be run"); }
} 
...
impl<T> Ord for OrdCell<T> where T: Ord {
    fn cmp(&self, _other: &Self) -> std::cmp::Ordering { panic!("should not be run"); }
} 

Mutable reference to original tree is turned into mutable reference to an editable tree by means of unsafe code:

let editable_tree: &mut BTreeMap<hidden::OrdCell<K>,V> = 
                                 unsafe { std::mem::transmute(&mut tree) };

The full code is tested subsequently.


Both methods and some simple tests are run with Miri. The full code is found in Playground:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.77s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/playground`

Standard Output

tree -> {-5: -5.0, -4: -4.0, -3: -3.0, -2: -2.0, -1: -1.0, 0: 0.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 4.0}
monotonic_map(tree,inc_f) -> Ok({-125: -10.0, -64: -8.0, -27: -6.0, -8: -4.0, -1: -2.0, 0: 0.0, 1: 2.0, 8: 4.0, 27: 6.0, 64: 8.0})
monotonic_map(tree,dec_f) -> Ok({-64: 12.0, -27: 9.0, -8: 6.0, -1: 3.0, 0: 0.0, 1: -3.0, 8: -6.0, 27: -9.0, 64: -12.0, 125: -15.0})
monotonic_map(tree,non_mono_f) -> Err("non monotonic map")
agnostic_map(&tree,inc_f) -> {-125: -10.0, -64: -8.0, -27: -6.0, -8: -4.0, -1: -2.0, 0: 0.0, 1: 2.0, 8: 4.0, 27: 6.0, 64: 8.0}
agnostic_map(&tree,dec_f) -> {-64: 12.0, -27: 9.0, -8: 6.0, -1: 3.0, 0: 0.0, 1: -3.0, 8: -6.0, 27: -9.0, 64: -12.0, 125: -15.0}
agnostic_map(&tree,non_mono_f) -> {-4: -4.0, -3: 3.0, -2: -2.0, -1: 1.0, 0: 0.0, 1: -1.0, 2: 2.0, 3: -3.0, 4: 4.0, 5: -5.0}

Here is a time performance test. The full code is found in Playground:

Standard Error

   Compiling playground v0.0.1 (/playground)
    Finished release [optimized] target(s) in 1.33s
     Running `target/release/playground`

Standard Output

time monotonic_map(tree,inc_f) -> 4859
time monotonic_map(tree,dec_f) -> 4714
time monotonic_map(tree,non_mono_f) -> 3415
time agnostic_map(&tree,inc_f) -> 5433
time agnostic_map(&tree,dec_f) -> 5308
time agnostic_map(&tree,non_mono_f) -> 12165

There is no significant gain in execution time of the non-standard method compared to the standard method.


Here is a count performance test; are counted any use of PartialEq::eq, PartialOrd::partial_cmp and Ord::cmp. The full code is found in Playground:

Standard Error

   Compiling playground v0.0.1 (/playground)
    Finished release [optimized] target(s) in 1.90s
     Running `target/release/playground`

Standard Output

count monotonic_map(tree,inc_f) -> 199999
count monotonic_map(tree,dec_f) -> 199999
count monotonic_map(tree,non_mono_f) -> 199999
count agnostic_map(&tree,inc_f) -> 399998
count agnostic_map(&tree,dec_f) -> 399998
count agnostic_map(&tree,non_mono_f) -> 3080315

The complexity O(N log(N)) is illustrated by the number of comparisons for a non-monotonic closure. Otherwise, there is not a big difference between the two approaches for monotonic closures.


Conclusion

The approach based on standard operations is certainly sufficient, especially in regards to the unsafe operation implemented by the second approach.

FreD
  • 393
  • 2
  • 12
  • 2
    Your approach based on `transmute()` is terribly unsound. The fact that Miri accepts it does not prove anything. The structure of `BTreeMap` is not guaranteed, so you cannot transmute it. – Chayim Friedman Sep 01 '23 at 10:15