0

I am trying to make an index all possible molecules with 0-46 Hydrogen, 0-20 carbon, 0-13 oxygen, etc. I have 7 atoms in which I am interested: H, C, O, N, Cl, F, and S. I have written the following for loop to show what I am trying to achieve:

MassListIndex = []
%MassIndex = [h,c,o,n,cl,f,s]
for h = 0:46;
  for c = 0:20;
    for o = 0:13;
        for n = 0:15;
            for cl=0:5;
                for f=0:5;
                    for s=0:5;
                        MassListIndex = [MassListIndex;[h,c,o,n,cl,f,s]];
                    end;
                end;
            end;
        end;
    end;
end;
end;

This strikes me as terribly inefficient; I don't want to wait around for 2 months for this to run. I have tried using the combinator.m script, but the problem is that there is only one input for the length of the set that is 'permutated' ie if I want to have up to 46 hydrogens, I need to also have 46 of each of the other 6 atoms. This is computationally...heavy (46^7 ~= 436 billion).

Is there any way to make this sort of computation more efficient? Or do I need to think more about shrinking my list by riding it of 'nonsense permutations' (As far as I know, the molecule H40C2 has never been observed!)

Thanks

  • Are you looking for just empirical formulas? Because with structural formulas it becomes even more complex – Luis Mendo Oct 06 '14 at 21:53
  • If you have enough RAM, define `vectors = { 0:46 0:20 0:13 0:15 0:5 0:5 0:5 };` and apply [this](http://stackoverflow.com/q/21895335/2586922). Or to save RAM, try using `uint8` data type: `vectors = { uint8(0:46) uint8(0:20) uint8(0:13) uint8(0:15) uint8(0:5) uint8(0:5) uint8(0:5) }`. The latter takes only a 5 seconds on my computer, and produces the desired 47755008x7 result – Luis Mendo Oct 06 '14 at 22:02
  • Only empirical formulas matter for the mass, so if you take your 47755008x7 matrix and `bsxfun` it with the atomic mass values from [IUPAC](http://www.degruyter.com/view/j/pac.2013.85.issue-5/pac-rep-13-03-02/pac-rep-13-03-02.xml), you've got yourself an answer. The algorithm for `nonsense permutations` is [left as an exercise to the OP](http://www.xkcd.com/1425/). – craigim Oct 06 '14 at 23:21
  • @PootersTheCat Please let me know if I understood your question correctly (in that case the code I linked to in my previous comment generates all combinations in about 5 seconds), so that I can mark the question as duplicate – Luis Mendo Oct 07 '14 at 10:17
  • @Mendo: No structure, as that would be insane! – PootersTheCat Oct 07 '14 at 22:41
  • @Mendo: Yes, your solution worked very well. – PootersTheCat Oct 07 '14 at 22:42
  • @PootersTheCat Glad about that. I'm closing as duplicate then. BTW, to address someone so that they get notified of a comment, you need to include the whole name without spaces or the first letters, but not the last name alone. So in my case `@LuisMendo` or maybe `@Luis`, but not `@Mendo` – Luis Mendo Oct 08 '14 at 21:31

1 Answers1

0

The first problem is not that hard. At least not if you remember to preallocate! I changed you the code into this:

mxidx = 47*21*14*16*6*6*6;
MassListIndex = zeros(mxidx,7);
idx = 1;
for h = 0:46;
    for c = 0:20;
        for o = 0:13;
            for n = 0:15;
                for cl=0:5;
                    for f=0:5;
                        for s=0:5;
                            MassListIndex(idx,:) = [h,c,o,n,cl,f,s];
                            idx = idx + 1;
                        end;
                    end;
                end;
            end;
        end;
    end;
end;

And it ran in less than a minute on my computer. Usually Matlab will warn you if you forget to preallocate; and whenever you (like in this case) know in advance the size of you matrix, you should preallocate!

The other problem on the other hand, 47^7 = 506623120463 (more than 500 billions - it is 47^7 instead of 46^7 since the list 0:46 has 47 elements). So even if you only use one byte pr. row in you matrix (which you certainly don't) it will still take up more that a half terabyte! And the calculation times will likewise be humongous!

But really when would you ever need this list. The way you have constructed your list you can easily calculate an entry just by the index eg.:

function m = MassListIndex(a,b)
a = a - 1;
lst = zeros(1,7);
for i = 1:7
    lst(8-i) = mod(a,47);
    a = floor(a /47);
end
if nargin < 2
    m = lst;
else
    m = lst(b);
end
end

Edit:

If you want it to also calculate the mass, you may do something like:

function mass = getMassFromPermutationNumber(a)
a = a - 1;
lst = zeros(1,7);
for i = 1:7
    lst(8-i) = mod(a,47);
    a = floor(a /47);
end
mass = lst*[1.00794;12.011;15.9994;20.1797;35.4527;18.9984;32.066];
end

Source for masses: http://environmentalchemistry.com/yogi/periodic/mass.html

Disclaimer: I'm not that good at chemistry, so please apply reasonable amounts of skepticism!

Jens Boldsen
  • 1,255
  • 13
  • 21
  • Thank you for the response! The reason for the list is to then calculate the mass of each permutation. That mass list is compared with HD mass spec data, so we can identify the molecular formula of the compounds detected. Does that make sense? – PootersTheCat Oct 07 '14 at 22:43
  • I've added an edit to my answer explaining how you may get the mass from the permutation number without making the list. If you insist on making a list, then there is no way around spending huge amounts of memory, and I see no way of speeding this process up significantly (there may be one - I just don't see it). – Jens Boldsen Oct 08 '14 at 08:53