3

Suppose I have an expression of the form First Form. I know that I can simplify the expression like so: Second Form. However, sympy.simplify and sympy.factor both return the original expression.

To work around this, I've been operating on the expression at a low level:

factor_map = defaultdict(set)
additive_terms = expr.as_coeff_add()[-1]
for term1, term2 in combinations(additive_terms, 2):
    common_terms = (
        set(term1.as_coeff_mul()[-1])
        & set(term2.as_coeff_mul()[-1])
    )
    if common_terms:
        common_factor = sympy.Mul(*common_terms)
        factor_map[common_factor] |= {term1, term2}

factor_map now looks like so:

{
    a: {a⋅x, -a⋅y}, 
    b: {b⋅x, -b⋅y}, 
    c: {-c⋅x, c⋅y}, 
    x: {a⋅x, b⋅x, -c⋅x}, 
    y: {-a⋅y, -b⋅y, c⋅y}
}

I sort it by the number of operations represented by the terms:

factor_list = sorted(
    factor_map.items(),
    key = lambda i: (i[0].count_ops() + 1) * len(i[1])
)[::-1]

I then just rebuild the expression:

used = set()
new_expr = 0
for item in factor_list:
    factor = item[0]
    appearances = item[-1]
    terms = 0
    for instance in appearances:
        if instance not in used:
            terms += instance.as_coefficient(factor)
            used.add(instance)
    new_expr += factor * terms
for term in set(additive_terms) - used:
    new_expr += term

This gives new_expr = d + x*(a + b - c) + y*(-a - b + c). Not great, but better.

I can also improve on this by dividing each combination of additive terms by each other, checking if the result is a number, and using that information to further reduce the output to new_expr = d + (x - y)*(a + b - c).

I've also tried to apply sympy.factor to every possible combination of additive terms, but obviously that blows up very quickly for any reasonably big expression.


Edit: Here's an implementation that uses sympy.factor on all partitions of the set of additive terms (partitions function borrowed from this answer):

def partition(collection):
    if len(collection) == 1:
        yield [ collection ]
        return

    first = collection[0]
    for smaller in partition(collection[1:]):
        # insert `first` in each of the subpartition's subsets
        for n, subset in enumerate(smaller):
            yield smaller[:n] + [[ first ] + subset]  + smaller[n+1:]
        # put `first` in its own subset 
        yield [ [ first ] ] + smaller


def partial_factor(expr):
    args = list(expr.as_coeff_add()[-1])
    # Groupings is just a cache to avoid duplicating factor operations
    groupings = {}

    unique = set()

    for p in partition(args):
        new_expr = 0
        for group in p:
            add_group = sympy.Add(*group)
            new_expr += groupings.setdefault(add_group, sympy.factor(add_group))
        unique.add(new_expr)
    return sorted(unique, key=sympy.count_ops)[0]

For an expression like a*x + b*y + c*z + d + e*x + f*y + h*z, it takes 7.8 seconds to run on my computer, whereas the other method takes 378 microseconds and gives the same result. Seems like there should be a way to be more rigorous than the first method without taking 20,000 times longer to solve it.


I feel like it shouldn't be this hard to get what I want. Is there an easier way to do this?

butterwagon
  • 479
  • 3
  • 9

3 Answers3

5

This similar question has an answer involving the func parameter to collect(). It seems to work in this particular case as well, although you have to explicitly mention d:

from sympy import *
a, b, c, d, x, y = symbols('a, b, c, d, x, y')
expr = a * x + b * x - c * x - a * y - b * y + c * y + d
collect(expr, d, func=factor)
=> d + (x - y)*(a + b - c)

This helps in this specific case, however a more general and automatic solution does not exist.

Also I could not find any documentation for this func parameter anywhere, other than it existing.

Github issue tracking this problem

jlh
  • 4,349
  • 40
  • 45
  • This could be improved by factoring by each symbol and returning the expression where count_ops is smallest: `sorted([collect(expr, sym, func=factor) for sym in expr.free_symbols], key=count_ops)[0]` gives the same result as well. – butterwagon Feb 27 '20 at 20:09
1

It's hard to suggest a strategy of "partial factoring" that works most of the time. Here is a thing to try, devised with your example in mind (a polynomial of several variables).

Given an expression: Attempt to factor it. If unsuccessful, look at the coefficient of each symbol that it contains; the method Expr.coeff(Symbol) does that. The symbol with the smallest coefficient (measured by the number of symbols contained) is considered an obstacle to factoring and is removed from the expression. Repeat.

This logic is encoded below, and partial_factor(a*x + b*x - c*x - a*y - b*y + c*y + d) does return d + (x - y)*(a + b - c).

def partial_factor(expr):
    to_factor = expr
    non_factorable = 0
    while to_factor != 0:
        if factor(to_factor) != to_factor:
            return factor(to_factor) + non_factorable
        coeffs = {v: to_factor.coeff(v) for v in to_factor.free_symbols}
        min_size = min([len(coeffs[v].free_symbols) for v in coeffs])
        for v in coeffs:
            if len(coeffs[v].free_symbols) == min_size:
                non_factorable += v*coeffs[v]
                to_factor -= v*coeffs[v]
                break
    return expr
  • I tried this out. It gives good results with `a*x + b*x - c*x - a*y - b*y + c*y + d`, but doesn't behave as well with `(a + b - c)*x + ( -a -b + c)*y + d*z`. – butterwagon Aug 28 '18 at 15:59
  • I modified the code to use the `as_coeff_add` method instead of `free_symbols` to measure the "size". Also, applying `factor` to the expression `(a + b - c)*x + ( -a -b + c)*y + d` just expands it, (which ends up terminating the function since `factor(to_factor) != to_factor`), so I changed that line to `if factor(to_factor).count_ops() < to_factor.count_ops()`. – butterwagon Aug 28 '18 at 16:06
0

@butterwagon: I tested your work around and found that it fails for expressions like x + 1, where it returns just x. To solve that I'd recommend to change the line

additive_terms = expr.as_coeff_add()[-1]

to

const, additive_terms = expr.as_coeff_add()

and add const to the final result:

new_expr += const
hisermann
  • 26
  • 2
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/33617518) – Woody1193 Jan 18 '23 at 00:03