1

I'm currently working on a project researching properties of some gas mixtures. Testing my code with different inputs, I came upon a bug(?) which I fail to be able to explain. Basically, it's concerning a computation on a numpy array in a for loop. When it computed the for-loop, it yields a different (and wrong) result as opposed to the manual construction of the result, using the same exact code snippets as in the for-loop, but indexing manually. I have no clue, why it is happening and whether it is my own mistake, or a bug within numpy.

It's super weird, that certain instances of the desired input objects run through the whole for loop without any problem, while others run perfectly up to a certain index and others fail to even compute the very first loop.

For instance, one input always stopped at index 16, throwing a:

ValueError: could not broadcast input array from shape (25,) into shape (32,)

Upon further investigation I could confirm, that the previous 15 loops threw the correct results, the results in loop of index 16 were wrong and not even of the correct size. When running loop 16 manually through the console, no errors occured...

The lower array shows the results for index 16, when it's running in the loop. These are the results for index 16, when running the code in the for loop manually in the console. These are, what one would expect to get.

The important part of the code is really only the np.multiply() in the for loop - I left the rest of it for context but am pretty sure it shouldn't interfere with my intentions.

def thermic_dissociation(input_gas, pressure):
# Copy of the input_gas object, which may not be altered out of scope
gas = copy.copy(input_gas)
# Temperature range
T = np.logspace(2.473, 4.4, 1000)
# Matrix containing the data over the whole range of interest
moles = np.zeros((gas.gas_cantera.n_species, len(T)))
# Array containing other property of interest
sum_particles = np.zeros(len(T))

# The troublesome for-loop:
for index in range(len(T)):
    print(str(index) + ' start')
    # Set temperature and pressure of the gas
    gas.gas_cantera.TP = T[index], pressure
    # Set gas mixture to a state of chemical equilibrium
    gas.gas_cantera.equilibrate('TP')
    # Sum of particles = Molar Density * Avogadro constant for every temperature
    sum_particles[index] = gas.gas_cantera.density_mole * ct.avogadro
    
    #This multiplication is doing the weird stuff, printed it to see what's computed before it puts it into the result matrix and throwing the error
    print(np.multiply(list(gas.gas_cantera.mole_fraction_dict().values()), sum_particles[index]))

    # This is where the error is thrown, as the resulting array is of smaller size, than it should be and thus resulting in the error
    moles[:, index] = np.multiply(list(gas.gas_cantera.mole_fraction_dict().values()), sum_particles[index])
    print(str(index) + ' end')

# An array helping to handle the results
molecule_order = list(gas.gas_cantera.mole_fraction_dict().keys())

return [moles, sum_particles, T, molecule_order]

Help will be very appreciated!

  • Could you try with `mole_fraction_dict(threshold=-np.inf)`? It's likely this threshold is the source of the unexpected outcome. – user7138814 Apr 20 '22 at 20:58
  • It will be easier for someone to help you if you provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Warren Weckesser Apr 20 '22 at 22:36
  • @user7138814 Thank you very much, this made it work! I haven't known about numpy array truncation. Don't you think that truncation of arrays in any command but __repr__ or __str__ are confusing and unnecessary? Or are there any usecases for that? – crezy1huurcompeletiens Apr 21 '22 at 07:09
  • @crezy1huurcompeletiens I posted an answer to explain better what happens. Regarding the truncating that happens sometimes when printing numpy arrays, yes I think this is very useful because some array might have 100 million elements or something and it would be problematic if all of that was printed. – user7138814 Apr 21 '22 at 10:28

2 Answers2

2

If you want the array of all species mole fractions, you should use the X property of the cantera.Solution object, which always returns that full array directly. You can see the documentation for that method: cantera.Solution.X`.

The mole_fraction_dict method is specifically meant for cases where you want to refer to the species by name, rather than their order in the Solution object, such as when relating two different Solution objects that define different sets of species.

Ray
  • 4,531
  • 1
  • 23
  • 32
  • Thank you for your remark - I realized, using 'cantera.Solution.X' and 'cantera.Solution.species_names' simplifies the handling of the data later on and also prevents the discussed error to happen. – crezy1huurcompeletiens Apr 21 '22 at 18:16
0

This particular issue is not related to numpy. The call to mole_fraction_dict returns a standard python dictionary. The number of elements in the dictionary depends on the optional threshold argument, which has a default value of 0.0.

The source code of Cantera can be inspected to see what happens exactly.

mole_fraction_dict

getMoleFractionsByName

In other words, a value ends up in the dictionary if x > threshold. Maybe it would make more sense if >= was used here instead of >. And maybe this would have prevented the unexpected outcome in your case.

As confirmed in the comments, you can use mole_fraction_dict(threshold=-np.inf) to get all of the desired values in the dictionary. Or -float('inf') can also be used.

In your code you proceed to call .values() on the dictionary but this would be problematic if the order of the values is not guaranteed. I'm not sure if this is the case. It might be better to make the order explicit by retrieving values out of the dict using their key.

user7138814
  • 1,991
  • 9
  • 11
  • Thank you very much - this solved my problem! How comes that the issue only appears, when the dictionary is called by the for loop and not, when I'm running the exact same commands through the console? Shouldn't the issue occur in both instances, following your previous explanation? – crezy1huurcompeletiens Apr 21 '22 at 18:13
  • @crezy1huurcompeletiens Agreed, that would be expected. My guess is that some small detail is still different between both code executions. Note that some values in your output are already extremely close to zero. If they become actually zero or smaller they are not included in the dictionary due to the threshold. – user7138814 Apr 21 '22 at 18:42
  • Yeah maybe when in the loop the array values get cropped to improve performance as opposed to when one runs the code manually or something like that. But even when the small values were altered, the array size should still remain unchanged. I guess we won't find the true reason behind this behaviour - but thank you for helping me find a work around! – crezy1huurcompeletiens Apr 21 '22 at 19:08