Extended find of unique tuples in a relation represented by a BDD

Question

Consider {<1,2>, <1,3>, <1,7>, <0,4>} as the set of tuples of a relation R. Now consider that R is represented (via its membership function) by a BDD. That is, The BDD representing R depends on variables {x1,x2, y1, y2, y3} where {x1, x2} are used to represent the first element of every tuple and {y1, y2, y3} are used to represent the second element.

Now, consider the problem of finding the set of tuples that have unique values in its first element. For the relation above that set would be {<0,4>}. All the other elements are discarded as they are more than one value having 1 in the first component.

As a second example consider the relation with set of tuples {<1,2>, <1,3>, <1,7>, <2,3>, <2,5>, <0,4>}. In such a case the expected result is still {<0,4>} as 2 appears more than once as first element.

The problem can be also seen as abstracting away the variables {y1,y2,y3} such that only unique values for {x1,x2} remain. With this result, the expected relation can be reconstructed by computing the conjunction of the resulting BDD with the input one.

In summary, the the question is: which are the BDD operations that have to be performed on The representation of R to obtain the BDD with only the unique tuples.

Notice that this is a genralization of this question

EDIT 1: The following code reflects the implementation I have so far. However, I am wondering if it is possible to get a more efficient version. For simplicity I intentionally omit the handling of the computed table (crucial to get better time complexity). Additionally, I use &, | and ! to denote the conjunction, disjunction and complement operations on BDDs.

BDD uniqueAbstract(BDD f, BDD cube) {
  if ((f.IsZero() || f.IsOne()) && !cube.IsOne())
    return zero();
  BDD T = high(f);
  BDD E = low(f);
  if(level(f) == level(c)) { // current var is abstracted
    BDD uniqueThen = uniqueAbstract(T, high(c));
    BDD existElse = existAbstract(E, high(c));

    BDD existThen = existAbstract(T, high(c));
    BDD uniqueElse = uniqueAbstract(E, high(c));

    return (uniqueThen & !existElse) | (uniqueElse & !existThen)
  } else {
    BDD uniqueThen = uniqueAbstract(T,c);
    BDD uniqueElse = uniqueAbstract(E,c);
    return ite(top(f), uniqueThen, uniqueElse);
  }
}

EDIT2: After trying three different implementations there are still some performance issues. Let me describe the three of them.

A C implementation of my approach, let me call it the reference implementation4.
The implementation proposed by user meolic in the accepted answer3.
A hybrid approach between the two and available2.

The goal of this update is to analyze a bit the results from using the three approaches. As time measures seem misleading at this time to judge them, I decided to evaluate the implementations on a different set of measures.

Recursive calls
Cache hits
Abstract simple. Number of times the function call was solved without requiring existential abstraction.
Abstract complex: Number of times the function call was solved requiring existential abstraction.
Exist abstract: Number of calls to the existential abstraction.

The results for implementation 1: (21123 ms): Unique abstraction statistics: Recursive calls: 1728549.000000 Cache hits: 638745.000000 Non abstract: 67207.000000 Abstract simple: 0.000000 Abstract complex: 0.000000 Exist abstract: 1593430.000000

Results for implementation 2: (run time: 54727 ms) Unique abstraction statistics: Recursive calls: 191585.000000 Cache hits: 26494.000000 Abstract simple: 59788.000000 Abstract complex: 12011.000000 Exist abstract: 24022.000000

Results for implementation 3: (run time: 20215 ms) Unique abstraction statistics: Recursive calls: 268044.000000 Cache hits: 30668.000000 Abstract simple: 78115.000000 Abstract complex: 46473.000000 Exist abstract: 92946.000000

EDIT 3: The following results were obtained after implementing every logical operation in terms of ITE5.

uniqueAbstractRecRef (21831 ms) Unique abstraction statistics: Total calls: 1723239 Optimized calls: 0 Total exist abstract calls: 30955618 Unique abstract calls to exist abstract: 2385915 Total ite calls: 3574555 Out of the total time, uniqueAbstractRecRef takes 4001 ms (12.4%)
uniqueAbstractSERec (56761 ms) Unique abstraction statistics: Total calls: 193627 Optimized calls: 60632 Total exist abstract calls: 16475806 Unique abstract calls to exist abstract: 24304 Total ite calls: 1271844 Out of the total time, uniqueAbstractSERec takes 33918 ms (51.5%)
uniqueAbstractRec (20587 ms) Unique abstraction statistics: Total calls: 270205 Optimized calls: 78486 Total exist abstract calls: 13186348 Unique abstract calls to exist abstract: 93060 Total ite calls: 1256872 Out of the total time, uniqueAbstractRec takes 3354 ms (10.6%)

What is your question? I see a good introduction, but I don't know what your problem is. — Vincent van der Weele, Oct 15 '14 at 05:41
@vincent-van-der-weele Just edited the question with an specific answer. — user1975278, Oct 15 '14 at 11:36
@benrudgers your proposal is not efficient because it requires the information in the BDD to be decoded. Hence, the BDD representation becomes useless and I rely on it for other aspects in my application. — user1975278, Oct 15 '14 at 13:27
@benrudgers, I agree with the time complexities you provide and in general with the reasoning. However, I need the relation (i.e. set of tuples) to be represented as a BDD. The argument is the time complexity for other operations on relations that were left out in my question because are not relevant. For instance, how to find the complement of a relation (i.e. the complement of its set of tuples modulo a universe). In BDDs that operation is O(1). For other operations the complexity offered by BDDs can be argued to be better as far as the BDD does not need to be decoded. — user1975278, Oct 15 '14 at 17:20
Your EDIT2 started an interesting question. Something can take more time with less recursive calls if these calls are more time-consumable. Can you implement both solutions using ITE for all Boolean operations (instead of AND+NOT), and then count the number of ITE operations? Moreover, you can try with profiler and find out where the time is spent. And last but not least, you got much better results using C++, aren't you? How fast is that program? Maybe, the final answer is hidden in bad/good memory management and caching. — meolic, Nov 14 '14 at 10:08
I have added EDIT 3 using ITE when possible. However, internally ITE detects the special cases and delegates to AND, OR, etc.. According to the profiler most of the time used in the second implementation is spent by cuddBDDExistAbstractRecur (19028 ms: 28.8% of the total execution). Notice that in the other implementations, exist abstract is called more times but consumes less time. Regarding my c++ implementation: your solution performs better. However C++ implementations cannot use the cache. Hence the slowest C impl. is faster than the C++ counterpart by a factor of almost 2. — user1975278, Nov 14 '14 at 22:51

max taldykin · Answer 1 · 2014-10-16T14:39:32.920

2

There exist simple and efficient solution if variables are ordered in such a way that x1 and x2 are at the top of BDD.

Consider BDD for second example.

You can traverse (in breadth-first order) first two layers of it to get four sub-BDDs. One for each possible combination of x1,x2. Three of those sub-BDDs a rooted at y1 and fourth is empty (constant False).

bdd

Now you can count number of elements in each sub-BDD (Algorithm C from Knuth's Volume 4 Fascicle 1, Bitwise Tricks & Techniques; Binary Decision Diagrams).

If number of elements in sub-BDD is greater than 1 then drop it (shortcut from parent node directly to False), otherwise leave it as it is.

It is possible to run this algorithm in single pass by memoizing partial results while counting elements.

edited Oct 16 '14 at 14:39

answered Oct 15 '14 at 12:08

max taldykin

12,459
5
45
64

I have updated the question with my current solution to the problem. It does not count but instead traverses the BDD to create the one I need. Do you think that captures the idea? – user1975278 Oct 15 '14 at 13:45
1

I don't understand every detail of your code, but it seems that idea is quite similar. – max taldykin Oct 16 '14 at 14:50
Your solution works if the variables being "abstracted" are below the ones that will remain. That is, in the case of the diagram above, variables representing x are above variables representing y. However, it does not work in the other way around. Suppose I have the exact same relation represented by the diagram above and try to apply your solution. Is there a way to make your proposal order independent? – user1975278 Oct 23 '14 at 21:12
Yep, as I noted, it is simple and efficient only if x1 and x2 are at the top. It is a bit more complicated if this is not the case. I'll try to extend my answer with more general solution. – max taldykin Oct 24 '14 at 05:54
The code I updated my answer with handles the problem in an order independent way. The problem is the time complexity. I m wondering if it can be reduced. – user1975278 Oct 28 '14 at 13:44

meolic · Answer 2 · 2014-11-04T08:51:38.120

2

Here is my implementation. I have studied author's proposed solution and it seems to me that it is the best if not the only simple BDD-based solution for arbitrary ordering. However, there may be some improvements if the algorithm is implemented in my way- PLEASE CHECK. I am using my own wrapper over BDD package but you should not have any troubles to understand it.

EDITED: I have simplified the solution, function Bdd_GetVariableChar() is not used anymore.

/* TESTING SOLUTION FOR QUESTION ON STACK OVERFLOW */
/* bdd_termFalse,bdd_termTrue: Boolean constants */
/* Bdd_isTerminal(f): check if f is Boolean constant */
/* Bdd_Low(f),Bdd_High(f): 'else' and 'then' subfunction */
/* Bdd_Top(f): literal function representing topvar of f */
/* Bdd_IsSmaller(f,g): check if topvar of f is above topvar of g */
/* existentialAbstraction(f,cube): \exist v.f for all v in cube */

Bdd_Edge specialAbstraction(Bdd_Edge f, Bdd_Edge cube) {
  if (Bdd_isTerminal(cube)) return f;
  if (Bdd_isTerminal(f)) return bdd_termFalse;
  if (Bdd_IsSmaller(f,cube)) {
    Bdd_Edge E,T;
    E = specialAbstraction(Bdd_Low(f),cube);
    T = specialAbstraction(Bdd_High(f),cube);
    return Bdd_ITE(Bdd_Top(f),T,E);
  } else if (Bdd_IsSmaller(cube,f)) {
    return bdd_termFalse;
  } else {
    Bdd_Edge E,T;
    cube = Bdd_High(cube);
    E = Bdd_Low(f);
    T = Bdd_High(f);
    if (Bdd_isEqv(E,bdd_termFalse)) {
      return specialAbstraction(T,cube);
    } else if (Bdd_isEqv(T,bdd_termFalse)) {
      return specialAbstraction(E,cube);
    } else {
      Bdd_Edge EX,TX,R;
      EX = existentialAbstraction(E,cube);
      TX = existentialAbstraction(T,cube);
      if (Bdd_isEqv(EX,TX)) return bdd_termFalse;
      R = Bdd_ITE(Bdd_ITE(EX,bdd_termFalse,T),
                  bdd_termTrue,
                  Bdd_ITE(TX,bdd_termFalse,E));
      return specialAbstraction(R,cube);
    }
  }
}

And, yes, if variable ordering is fixed with x above y, the algorithm can really be much more efficient - you can remove all the calculations from the most complex 'else' block and just return 0.

Here are some testing runs:

CUBE (JUST IN CASE YOU ARE NOT FAMILIAR WITH BDD ALGORITHMS)
  +  y1 y2 y3 y4 y5

ORIGINAL (ORDERED WITH X ABOVE Y)
  +  *x1 *x2 x3 *x4 x5 y1 *y2 y3 y4 y5
  +  *x1 x2 *x3 *x4 *x5 y1 y2 *y3 y4 y5
  +  *x1 x2 *x3 *x4 x5 *y1 y2 *y3 y4 y5
  +  *x1 x2 *x3 x4 *x5 y1 *y2 y3 *y4 *y5
  +  *x1 x2 x3 *x4 x5 *y1 *y2 *y3 *y4 y5
  +  *x1 x2 x3 *x4 x5 *y1 y2 y3 *y4 *y5
  +  x1 *x2 *x3 *x4 *x5 y1 y2 y3 y4 *y5
  +  x1 x2 *x3 x4 x5 *y1 *y2 *y4 *y5
  +  x1 x2 x3 *x4 *x5 *y1 *y2 *y3 y4 *y5

ABSTRACTION
  +  *x1 *x2 x3 *x4 x5
  +  *x1 x2 *x3 *x4
  +  *x1 x2 *x3 x4 *x5
  +  x1 *x2 *x3 *x4 *x5
  +  x1 x2 x3 *x4 *x5

ORIGINAL (ORDERED WITH Y ABOVE X)
  +  *y1 *y2 *y3 *y4 *y5 x1 x2 *x3 x4 x5
  +  *y1 *y2 *y3 *y4 y5 *x1 x2 x3 *x4 x5
  +  *y1 *y2 *y3 y4 *y5 x1 x2 x3 *x4 *x5
  +  *y1 *y2 y3 *y4 *y5 x1 x2 *x3 x4 x5
  +  *y1 y2 *y3 y4 y5 *x1 x2 *x3 *x4 x5
  +  *y1 y2 y3 *y4 *y5 *x1 x2 x3 *x4 x5
  +  y1 *y2 y3 *y4 *y5 *x1 x2 *x3 x4 *x5
  +  y1 *y2 y3 y4 y5 *x1 *x2 x3 *x4 x5
  +  y1 y2 *y3 y4 y5 *x1 x2 *x3 *x4 *x5
  +  y1 y2 y3 y4 *y5 x1 *x2 *x3 *x4 *x5

ABSTRACTION
  +  *x1 *x2 x3 *x4 x5
  +  *x1 x2 *x3 *x4
  +  *x1 x2 *x3 x4 *x5
  +  x1 *x2 *x3 *x4 *x5
  +  x1 x2 x3 *x4 *x5

ORIGINAL (MIXED ORDER)
  +  *x1 *x2 y1 *y2 y3 y4 y5 x3 *x4 x5
  +  *x1 x2 *y1 *y2 *y3 *y4 y5 x3 *x4 x5
  +  *x1 x2 *y1 y2 *y3 y4 y5 *x3 *x4 x5
  +  *x1 x2 *y1 y2 y3 *y4 *y5 x3 *x4 x5
  +  *x1 x2 y1 *y2 y3 *y4 *y5 *x3 x4 *x5
  +  *x1 x2 y1 y2 *y3 y4 y5 *x3 *x4 *x5
  +  x1 *x2 y1 y2 y3 y4 *y5 *x3 *x4 *x5
  +  x1 x2 *y1 *y2 *y3 *y4 *y5 *x3 x4 x5
  +  x1 x2 *y1 *y2 *y3 y4 *y5 x3 *x4 *x5
  +  x1 x2 *y1 *y2 y3 *y4 *y5 *x3 x4 x5

ABSTRACTION
  +  *x1 *x2 x3 *x4 x5
  +  *x1 x2 *x3 *x4
  +  *x1 x2 *x3 x4 *x5
  +  x1 *x2 *x3 *x4 *x5
  +  x1 x2 x3 *x4 *x5

edited Nov 04 '14 at 08:51

answered Nov 03 '14 at 16:30

meolic

1,177
2
15
41

thanks a lot for your answer. I am not very familiar with the BDD wrapper/library you are using. Could you elaborate a bit on the function Bdd_GetVariableChar?. For example, by using it in the if's guard: Bdd_GetVariableChar(f) == 'x' are you testing the case in which the top variable of f is not being abstracted?. – user1975278 Nov 04 '14 at 00:49
If Bdd_GetVariableChar(f) == 'x' then top variable is not being abstracted. I will pubish implementation without this function but I have to check it first. – meolic Nov 04 '14 at 07:56
I have implemented two versions of your proposal. One in C++ ad the other in C. Both of them are available [here](https://gist.github.com/ggutierrez/6dc2af945ee31b30d6f4) and [here](https://gist.github.com/ggutierrez/c3bfc03a722e26c8d7f5). They both work, however I got an interesting result: the C++ version is very efficient compared to my initial proposal. However the C version is very slow when compared to my initial approach. I am using CUDD and this is my first project with BDDs so I must be doing something really bad. Any idea what could it be?. – user1975278 Nov 06 '14 at 19:27
First let me explain all those if statements with bdd==NULL. I do that because most functions in CUDD do the same. As far as i understand calling a BDD function returns null if something goes wrong (i.e. exhausted memory). Even in those case one must decrement the reference count of the other functions and then return NULL to propagate the error. Am I getting it the wrong way? – user1975278 Nov 08 '14 at 21:32
I updated my [implementation](https://gist.github.com/ggutierrez/729f803e732fb354bc40) to look as close as possible to cuddBddExistAbstract(). Unfortunately the changes do not improve the performance. Using **cuddRef** and **cuddDeref** makes my implementation crash; I had to use **Cudd_Ref** and **Cudd_IterDerefBdd** instead. The only use of "Cudd_Deref(result)" is before returning when the actual abstraction takes place and is now **after** storing the result in the cache. Lines 70, 77 and 88 also store values in the cache. Is there something you can advise me to try? – user1975278 Nov 09 '14 at 23:40
Hmm, you must use Cudd_Ref(EX) before line 83. Moreover, you should Cudd_IterDerefBdd(manager, EX) if (TX == NULL) or if (EX == TX). Line 105 should be removed. And last (but I am not expert here): move line 119 after line 121, change Cudd_IterDerefBdd to Cudd_Deref in line 129, and remove lines 128 and 135. In your main program, you should do Cudd_Ref(res) after res = uniqueAbstractSERec(). If still not good then please report if the program is faster without cache tables and memory management (use relative small example and remove all NULL catching and all Ref and Deref). – meolic Nov 10 '14 at 07:55
I have applied must of your suggestions. However, I have some questions: 1) Moving the line 119 after 121 will reference res1 after decrementing the count of _ iteExFalseThen_ and _ iteTxFalseElse_. This is the oposite approach to the taken by [cuddBddExistAbstractRecur](https://gist.github.com/ggutierrez/95f5588f390551f66a49) lines 76-78. Or am I missing something?. In any case I made the change and it does not affect the zero reference count of my unit tests. 2) "change Cudd_IterDerefBdd to Cudd_Deref in line 129 and remove lines 128 and 135) do affect the check zero references in my tests. – user1975278 Nov 11 '14 at 02:52
Moving line 119 was wrong suggestion. Sorry. I am not familiar with CUDD and now I am lost. @Lorenzo and other CUDD users, please help. – meolic Nov 11 '14 at 07:26
I think I found the problem. There is a small but significative difference between your solution and my initial proposal. In my proposal I returned **(uniqueThen & !existElse) | (uniqueElse & !existThen)** where both **uniqueThen** and **uniqueElse** are the result of the recursive calls. However in your solution you recurse on **R** which is constructed from **(~EX /\ T) \/ (~TX /\ E)** where T and E are the _then_ and _else_ branches of f. I have tested both way and they pass my tests, however I am not sure if they are equivalent. – user1975278 Nov 12 '14 at 18:04
They are equivalent! This is the main point of my solution and it is the main source of improvement (much less recursions). However, you posted C implementation which use this better approach and still get bad results... – meolic Nov 12 '14 at 18:26
correct me if I am wrong. I will get less recursive calls but it will increase the number of times that the existential quantification will be called. Right? – user1975278 Nov 12 '14 at 20:21
If I am not wrong then you will get less recursive calls and therefore you will also have less (i.e. optimal) existential quantifications. – meolic Nov 12 '14 at 22:23
Your proposal produce less recursive calls. That is supported by the data I added to the question. It is very intriguing to me why with less recursive calls it takes twice the time. Can you suggest me which data to collect in order to draw a conclusion? – user1975278 Nov 14 '14 at 04:12

score 1 · Answer 3 · answered Aug 29 '15 at 18:06

One approach involves directly translating the definition of uniqueness:

R(x,y) and forall z . ~R(x,z) or y = z

An implementation may look like this:

def inspect(function, name, nvars):
    sys.stdout.write(name) # avoid print's trailing space or newline
    function.summary(nvars) # minterm count needs number of variables
    function.printCover()

import sys
from cudd import Cudd
m = Cudd()

nx = 2
ny = 3
x = [m.bddVar() for i in range(nx)]
y = [m.bddVar() for i in range(ny)]

R = (~x[0] & x[1] & (~y[0] & y[1] | y[1] & y[2]) |
     x[0] & ~x[1] & (y[0] ^ y[1]) & y[2] |
     ~x[0] & ~x[1] & y[0] & ~y[1] & ~y[2])

# This approach is independent of variable order.  We are free to enable
# reordering or call it explicitly.
m.reduceHeap()

inspect(R, 'R', nx+ny)

# Create auxiliary variables and selector function.
z = [m.bddVar() for i in range(ny)]
zcube = reduce(lambda a, b: a & b, z)
P = ~m.xeqy(y,z)

# A pair is in L iff:
#   - it is in R
#   - there is no other pair in R with the same x and different y
L = R & ~(R.swapVariables(y,z).andAbstract(P,zcube))

inspect(L, 'L', nx+ny)

Result of running that code:

R: 10 nodes 1 leaves 6 minterms
01-11 1
10101 1
10011 1
00100 1
0101- 1

L: 6 nodes 1 leaves 1 minterms
00100--- 1

The first two variables encode the first element of the pair; the next three variables encode the second element of the pair; the last three variables are the auxiliary variables.

The code applies DeMorgan's to the formula above to make use of andAbstract.

I have finally implemented your suggestions. At first it as very inefficient because of the variable order: I was getting a pretty bad representation for the equality relation. I decided to use an "interleaved" variable order and the result was much better. I like the simplicity of your approach (when compared to the other two) however the number of times swapVariables executes in my case is quite high and renders de solution slower than the one originally proposed. — user1975278, Nov 28 '15 at 21:46
by the way, which python wrapper do you use on CUDD? is it freely available? — user1975278, Nov 28 '15 at 21:47
Yes, the size of the BDD for the equality relation goes from exponential to linear in the number of variables depending on the order, and interleaving gives the best results. The z variables should be interleaved as well to speed up variable substitution. — Fabio Somenzi, Nov 30 '15 at 02:53
The Python wrapper is an experimental one, which is not ready for prime time yet. (I first need to finish release 3.0.0.) — Fabio Somenzi, Nov 30 '15 at 02:55

Extended find of unique tuples in a relation represented by a BDD

3 Answers3

Linked