12

Suppose we want to generate a list of primes p for which p + 2 is also prime.

A quick solution is to generate a complete list of the first n primes and use the Select function to return the elements which meet the condition.

Select[Table[Prime[k], {k, n}], PrimeQ[# + 2] &]

However, this is inefficient as it loads a large list into the memory before returning the filtered list. A For loop with Sow/Reap (or l = {}; AppendTo[l, k]) solves the memory issue, but it is far from elegant and is cumbersome to implement a number of times in a Mathematica script.

Reap[
  For[k = 1, k <= n, k++,
   p = Prime[k];
   If[PrimeQ[p + 2], Sow[p]]
  ]
 ][[-1, 1]]

An ideal solution would be a built-in function which allows an option similar to this.

Table[Prime[k], {k, n}, AddIf -> PrimeQ[# + 2] &]
Vortico
  • 2,610
  • 2
  • 32
  • 49
  • so essentially you want a more sophisticated [list comprehension](http://en.wikipedia.org/wiki/List_comprehension)... – Simon Jun 16 '11 at 07:17
  • 3
    by the way: there's a simple twin primes generator using `Select` in the [`NextPrime`](http://reference.wolfram.com/mathematica/ref/NextPrime.html#42436644) documentation. – Simon Jun 16 '11 at 07:19
  • Nice find, I didn't catch that. Of course, finding twin primes was just an easy example I could think of for this. – Vortico Jun 17 '11 at 03:57
  • Related: http://stackoverflow.com/q/6313505/618728 – Mr.Wizard Jan 16 '12 at 10:20

6 Answers6

20

I will interpret this more as a question about automation and software engineering rather than about the specific problem at hand, and given a large number of solutions posted already. Reap and Sow are good means (possibly, the best in the symbolic setting) to collect intermediate results. Let us just make it general, to avoid code duplication.

What we need is to write a higher-order function. I will not do anything radically new, but will simply package your solution to make it more generally applicable:

Clear[tableGen];
tableGen[f_, iter : {i_Symbol, __}, addif : Except[_List] : (True &)] :=
 Module[{sowTag},   
  If[# === {}, #, First@#] &@
       Last@Reap[Do[If[addif[#], Sow[#,sowTag]] &[f[i]], iter],sowTag]];

The advantages of using Do over For are that the loop variable is localized dynamically (so, no global modifications for it outside the scope of Do), and also the iterator syntax of Do is closer to that of Table (Do is also slightly faster).

Now, here is the usage

In[56]:= tableGen[Prime, {i, 10}, PrimeQ[# + 2] &]

Out[56]= {3, 5, 11, 17, 29}

In[57]:= tableGen[Prime, {i, 3, 10}, PrimeQ[# + 1] &]

Out[57]= {}

In[58]:= tableGen[Prime, {i, 10}]

Out[58]= {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}

EDIT

This version is closer to the syntax you mentioned (it takes an expression rather than a function):

ClearAll[tableGenAlt];
SetAttributes[tableGenAlt, HoldAll];
tableGenAlt[expr_, iter_List, addif : Except[_List] : (True &)] :=
 Module[{sowTag}, 
  If[# === {}, #, First@#] &@
    Last@Reap[Do[If[addif[#], Sow[#,sowTag]] &[expr], iter],sowTag]];

It has an added advantage that you may even have iterator symbols defined globally, since they are passed unevaluated and dynamically localized. Examples of use:

In[65]:= tableGenAlt[Prime[i], {i, 10}, PrimeQ[# + 2] &]

Out[65]= {3, 5, 11, 17, 29}

In[68]:= tableGenAlt[Prime[i], {i, 10}]

Out[68]= {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}

Note that since the syntax is different now, we had to use the Hold-attribute to prevent the passed expression expr from premature evaluation.

EDIT 2

Per @Simon's request, here is the generalization for many dimensions:

ClearAll[tableGenAltMD];
SetAttributes[tableGenAltMD, HoldAll];
tableGenAltMD[expr_, iter__List, addif : Except[_List] : (True &)] :=
Module[{indices, indexedRes, sowTag},
  SetDelayed @@  Prepend[Thread[Map[Take[#, 1] &, List @@ Hold @@@ Hold[iter]], 
      Hold], indices];
  indexedRes = 
    If[# === {}, #, First@#] &@
      Last@Reap[Do[If[addif[#], Sow[{#, indices},sowTag]] &[expr], iter],sowTag];
  Map[
    First, 
    SplitBy[indexedRes , 
      Table[With[{i = i}, Function[Slot[1][[2, i]]]], {i,Length[Hold[iter]] - 1}]], 
    {-3}]];

It is considerably less trivial, since I had to Sow the indices together with the added values, and then split the resulting flat list according to the indices. Here is an example of use:

{i, j, k} = {1, 2, 3};
tableGenAltMD[i + j + k, {i, 1, 5}, {j, 1, 3}, {k, 1, 2}, # < 7 &]

{{{3, 4}, {4, 5}, {5, 6}}, {{4, 5}, {5, 6}, {6}}, {{5, 6}, {6}}, {{6}}}

I assigned the values to i,j,k iterator variables to illustrate that this function does localize the iterator variables and is insensitive to possible global values for them. To check the result, we may use Table and then delete the elements not satisfying the condition:

In[126]:= 
DeleteCases[Table[i + j + k, {i, 1, 5}, {j, 1, 3}, {k, 1, 2}], 
    x_Integer /; x >= 7, Infinity] //. {} :> Sequence[]

Out[126]= {{{3, 4}, {4, 5}, {5, 6}}, {{4, 5}, {5, 6}, {6}}, {{5, 6}, {6}}, {{6}}}

Note that I did not do extensive checks so the current version may contain bugs and needs some more testing.

EDIT 3 - BUG FIX

Note the important bug-fix: in all functions, I now use Sow with a custom unique tag, and Reap as well. Without this change, the functions would not work properly when expression they evaluate also uses Sow. This is a general situation with Reap-Sow, and resembles that for exceptions (Throw-Catch).

EDIT 4 - SyntaxInformation

Since this is such a potentially useful function, it is nice to make it behave more like a built-in function. First we add syntax highlighting and basic argument checking through

SyntaxInformation[tableGenAltMD] = {"ArgumentsPattern" -> {_, {_, _, _., _.}.., _.},
                                    "LocalVariables" -> {"Table", {2, -2}}};

Then, adding a usage message allows the menu item "Make Template" (Shift+Ctrl+k) to work:

tableGenAltMD::usage = "tableGenAltMD[expr,{i,imax},addif] will generate \
a list of values expr when i runs from 1 to imax, \
only including elements if addif[expr] returns true.
The default of addiff is True&."

A more complete and formatted usage message can be found in this gist.

Simon
  • 14,631
  • 4
  • 41
  • 101
Leonid Shifrin
  • 22,449
  • 4
  • 68
  • 100
  • +1. That's a very handy function. Is it possible to make it behave like `Table` for multiple iterators and output a multidimensional list? At the moment, the naive replacement of `iter_` with `iter__` works, but for obvious reasons, yields a flattened list. – Simon Jun 16 '11 at 09:00
  • @Simon I added a multi-dimensional version, see my edit. It is however considerably more complex / obscure and perhaps less elegant. – Leonid Shifrin Jun 16 '11 at 09:39
  • @Leonid: Ouch! A lot less elegant. It looks like the type of code you only understand as you're writing it. I understand how it works (thanks to your description), but still... Although it's a lot more complicated than your other solution, the speed hit is not too bad. – Simon Jun 16 '11 at 10:39
  • @Leonid: btw, `tableGen[i + j, {i, 2}, {j, 2}]` doesn't work since it thinks `{j,2}` is the `addif` function. To make it work you need to restrict the head of `addif` or perform some other pattern check. – Simon Jun 16 '11 at 10:42
  • @Simon In this particular case, clarity problem is bearable IMO, because we forged a new scoping construct, and as long as it is correct, one can use it and forget about the implementation (it is general and thus highly reusable). In fact, what is done is quite clear: I have to extract iterator variables into a list, and store in some variable `indexes` to evaluate with the value, for `Sow`- ing (the `SetDelayed@@...` machinery is to prevent possible premature evaluation). Another non-triviality is in making a list of pure functions for `SplitBy` dynamically (I used `Table`). The rest is easy. – Leonid Shifrin Jun 16 '11 at 10:47
  • @Simon Yes, I did it in the general function with `__List`, but forgot to bullet-proof the first one. Thanks, will edit. Oops... The __List check does not suffice either. Will change that in the general one as well. Actually, the first one is for a single iterator, so it is not supposed to work for two iterators anyway. But we still need to error-check of course. – Leonid Shifrin Jun 16 '11 at 10:49
  • @Simon Done. Should work fine now. Since we know that type-checking a function is not a rewarding activity in mma, my check is minimally restrictive, just to make things work. – Leonid Shifrin Jun 16 '11 at 10:58
  • @Leonid: Looks good, it's the same pattern that I settled on! Now just add some `SyntaxInformation` and a `usage` message and it behaves like a built-in function. (I've got this written up if you want me to add them). – Simon Jun 16 '11 at 11:05
  • @Simon If you can add those, that would be great! I am a bit short of time right now. – Leonid Shifrin Jun 16 '11 at 11:07
  • @Simon I have discovered a serious bug in my functions. Please see my latest edit for the description. I have updated the code to fix it. – Leonid Shifrin Jun 16 '11 at 13:49
  • This is great! I've edited the function a little and added it to my Mathematica kernel. I'll probably use this quite a bit in the future. – Vortico Jun 17 '11 at 05:00
  • 1
    @Vortico: I doubt you added it to your Mma kernel (unless you work at WRI?) - but I agree, it is a function that I can see myself using quite regularly. Thanks again Leonid! – Simon Jun 17 '11 at 06:46
2

I think the Reap/Sow approach is likely to be most efficient in terms of memory usage. Some alternatives might be:

DeleteCases[(With[{p=Prime[#]},If[PrimeQ[p+2],p,{}] ] ) & /@ Range[K]),_List]

Or (this one might need some sort of DeleteCases to eliminate Null results):

FoldList[[(With[{p=Prime[#2]},If[PrimeQ[p+2],p] ] )& ,1.,Range[2,K] ]

Both hold a big list of integers 1 to K in memory, but the Primes are scoped inside the With[] construct.

Verbeia
  • 4,400
  • 2
  • 23
  • 44
2

Yes, this is another answer. Another alternative that includes the flavour of the Reap/Sow approach and the FoldList approach would be to use Scan.

result = {1};
Scan[With[{p=Prime[#]},If[PrimeQ[p+2],result={result,p}]]&,Range[2,K] ];
Flatten[result]

Again, this involves a long list of integers, but the intermediate Prime results are not stored because they are in the local scope of With. Because p is a constant in the scope of the With function, you can use With rather than Module, and gain a bit of speed.

Verbeia
  • 4,400
  • 2
  • 23
  • 44
  • You can avoid using `With` by using a pure function to avoid double-evaluation: `If[PrimeQ[# + 2], result = {result, #}] &[Prime[#]] &`. This is not always possible, but possible here since the body of your `If` does not contain slot variables (`#1` etc). – Leonid Shifrin Jun 16 '11 at 10:37
  • Thanks @Leonid: these are things I just can't try out at work because (a) I don't have Mma installed there and (b) I'm meant to be working on other things. Is there a particular reason to avoid With, though? – Verbeia Jun 16 '11 at 11:44
  • I just thought pure-function-based method is a bit simpler. Also, since `With` is a scoping construct and must resolve possible name conflicts, anonymous pure functions might be slightly faster. – Leonid Shifrin Jun 16 '11 at 11:49
  • @Leonid: fair enough, I just don't trust my ability to write nested pure functions like that while flying blind. – Verbeia Jun 16 '11 at 11:57
  • And this may be a right thing to do for me as well. I am not at all sure that the loss of readability in my solution is justified in the long term. – Leonid Shifrin Jun 16 '11 at 12:02
2

You can perhaps try something like this:

Clear[f, primesList]
f = With[{p = Prime[#]},Piecewise[{{p, PrimeQ[p + 2]}}, {}] ] &;
primesList[k_] := Union@Flatten@(f /@ Range[k]);

If you want both the prime p and the prime p+2, then the solution is

Clear[f, primesList]
f = With[{p = Prime[#]},Piecewise[{{p, PrimeQ[p + 2]}}, {}] ] &;
primesList[k_] := 
  Module[{primes = f /@ Range[k]}, 
   Union@Flatten@{primes, primes + 2}];
abcd
  • 41,765
  • 7
  • 81
  • 98
  • 1
    You're evaluating Prime[p] twice for each case. I'd suggest changing the function f to use With[{p=Prime[#]},Piecewise... etc – Verbeia Jun 16 '11 at 08:28
  • @Verbeia: Thanks for pointing that out, I had forgotten to do so. – abcd Jun 16 '11 at 08:36
1

Well, someone has to allocate memory somewhere for the full table size, since it is not known before hand what the final size will be.

In the good old days before functional programming :), this sort of thing was solved by allocating the maximum array size, and then using a separate index to insert to it so no holes are made. Like this

x=Table[0,{100}];  (*allocate maximum possible*)
j=0;
Table[ If[PrimeQ[k+2], x[[++j]]=k],{k,100}];

x[[1;;j]]  (*the result is here *)

{1,3,5,9,11,15,17,21,27,29,35,39,41,45,51,57,59,65,69,71,77,81,87,95,99}
Nasser
  • 12,849
  • 6
  • 52
  • 104
  • 6
    I disagree with your first statement. Allocating memory for a full table is the last resort, and a waste of resources. The only reason to do that in Mathematica is when we know the upper bounds for the size and are ready to trade memory for speed (given the nature of mma performance-tuning). Generally, when the size is not known in advance, the solution is to *not* allocate huge memory but to construct a list dynamically. In a lower level language like C one would likely use linked lists for that. Linked lists may also be used in Mathematica, but `Reap`-`Sow` approach is superior. – Leonid Shifrin Jun 16 '11 at 08:22
  • 1
    And by the way, functional programming is almost as old as imperative. LISP is just a year younger than Fortran, and older than most other languages currently in use. – Leonid Shifrin Jun 16 '11 at 08:25
  • @Leonid, when I said to allocate maximum size, I am clearly talking about this problem. We know the maximum size here is N, where N=100 in this case. For such problems, I do not see how anything else can be more efficient. Access to an array is O(1) in time. nothing can be faster than that? – Nasser Jun 16 '11 at 09:07
  • @me, After thinking more, I realized this is a prime number thing. And since prime number are very sparse, then allocating the maximum number is not wise. Unless one knows EXACTLY what the total number of primes that will be found with such property, (i.e. p+2 is also prime) I do not know prime number theory, so do not know if this is something which is known. What I mean to say is that if one knows before hand how much storage is needed, then pre-allocating an array to hold the result in is very efficient. It is a typical trade-off between time and space I suppose. – Nasser Jun 16 '11 at 09:32
  • @Nasser At the moment it is assumed that there is an infinite number of twin primes. See http://en.wikipedia.org/wiki/Twin_prime – DavidC Jun 16 '11 at 13:10
  • @David, I mean, given a number N, if there is a known maximum measure of the number of twin prime numbers below N? If there is such measure, say M. Then if someone tells me to find all the twin prime numbers below N, I can start by allocating an array of size M to use to store the twin primes in as they are found. For example, from your reference, it says for N=1000, M=35. If there is a formula to find such M in general, an upper limit, then may be it can be used? that is what I mean. If someone wants to find all twin prime number, then you are right to suggest that they needs lots of memory. – Nasser Jun 16 '11 at 13:52
  • @Nasser Thanks the clarification. Needless to say, there must be far fewer twin primes than PrimePi/2, but that's not very helpful, is it? – DavidC Jun 16 '11 at 17:41
  • @Nasser: I believe that in Mathematica, integers are represented by arbitrarily-sized objects (like in Python) rather than a constant number of bytes. Because of this, a new object is created for each prime, and each zero in the preallocated list will be replaced, not overwritten. The zeros may even remain in the memory. – Vortico Jun 17 '11 at 04:16
0

Here's another couple of alternatives using NextPrime:

pairs1[pmax_] := Select[Range[pmax], PrimeQ[#] && NextPrime[#] == 2 + # &]

pairs2[pnum_] := Module[{p}, NestList[(p = NextPrime[#];
                      While[p + 2 != (p = NextPrime[p])]; 
                      p - 2) &, 3, pnum]] 

and a modification of your Reap/Sow solution that lets you specify the maximum prime:

pairs3[pmax_] := Module[{k,p},
                   Reap[For[k = 1, (p = Prime[k]) <= pmax, k++,
                        If[PrimeQ[p + 2], Sow[p]]]][[-1, 1]]]

The above are in order of increasing speed.

In[4]:= pairs2[10000]//Last//Timing
Out[4]= {3.48,1261079}
In[5]:= pairs1[1261079]//Last//Timing
Out[5]= {6.84,1261079}
In[6]:= pairs3[1261079]//Last//Timing
Out[7]= {0.58,1261079}
Simon
  • 14,631
  • 4
  • 41
  • 101