2

After I have shortly succeed in writing a minimalistic Python3.6 extension module in C++ ( see here ) I plan to provide a Python module which does the same as the following Python function iterUniqueCombos():

def iterUniqueCombos(lstOfSortableItems, sizeOfCombo):
    lstOfSortedItems = sorted(lstOfSortableItems)
    sizeOfList = len(lstOfSortedItems)

    lstComboCandidate = []

    def idxNextUnique(idxItemOfList):
        idxNextUniqueCandidate = idxItemOfList + 1

        while (
                idxNextUniqueCandidate < sizeOfList 
                    and 
                lstOfSortedItems[idxNextUniqueCandidate] == lstOfSortedItems[idxItemOfList]
        ): # while
            idxNextUniqueCandidate += 1

        idxNextUnique = idxNextUniqueCandidate

        return idxNextUnique

    def combinate(idxItemOfList):
        if len(lstComboCandidate) == sizeOfCombo:
            yield tuple(lstComboCandidate)
        elif sizeOfList - idxItemOfList >= sizeOfCombo - len(lstComboCandidate):
            lstComboCandidate.append(lstOfSortedItems[idxItemOfList])
            yield from combinate(idxItemOfList + 1)
            lstComboCandidate.pop()
            yield from combinate(idxNextUnique(idxItemOfList))

    yield from combinate(0)

I have some basic understanding of Python and C++ programming, but absolutely no idea how to "translate" Pythons yield into C++ code of an Python extension module. So my question is:

How to write C++ code (of a Python module) able to return a Python iterator object?

Any hints to get me started are welcome.

UPDATE (status 2017-05-07):

Both the comment: yield has no C++ equivalent. I'd start by implementing the iterator protocol manually in Python, to get out of the yield and yield from mindset. – user2357112 Apr 26 at 1:16 and the hint in the answer by danny the answer to this question is the same as asking 'How do I implement an iterator without using yield' but in a C++ extension instead of pure Python. put my programming efforts in the wrong direction of reinventing the wheel by rewriting the algorithms code in order to eliminate yield and by writing the C-code of a Python extension module from scratch (what resulted in raining Segmentation Fault errors).

The state-of-the-art of my current knowledge on the subject of the question is that using Cython it is possible to translate the above Python code (which is using yield) directly into C code of a Python extension module.

This is not only possible using the Python code just as it is (without the need of rewriting anything), but in addition to that the speed of the extension module created by Cython from the algorithm using yield runs at least twice that fast as the extension module created from the to an iterator class using __iter__ and __next__ rewritten algorithm (the latter is valid if no Cython specific speed optimization code is added to the Python script).

Community
  • 1
  • 1
Claudio
  • 7,474
  • 3
  • 18
  • 48
  • 1
    `yield` has no C++ equivalent. I'd start by [implementing the iterator protocol manually](https://www.python.org/dev/peps/pep-0234/) in Python, to get out of the `yield` and `yield from` mindset. – user2357112 Apr 26 '17 at 01:16

2 Answers2

1

An iterator in Python is a special form of a generator and is implemented by a class containing methods __iter__ and next where __iter__ returns self and next returns each value in turn, raising StopIteration on end of iteration - see PEP.

To provide a C++ equivalent, the C++ code needs to implement those same python functions to conform to the protocol. The resulting extension type is an iterator.

In other words, the answer to this question is the same as asking 'How do I implement an iterator without using yield' but in a C++ extension instead of pure Python. There are several existing answers for this on stack overflow.

NB - next is __next__ on Python 3.

danny
  • 5,140
  • 1
  • 19
  • 31
  • The answer directly answers the question of how to implement an iterator in much more detail than the comment saying 'implement protocol described in PEP here'. Therefore I feel it is clearer and more appropriate. If your question is no longer applicable or you feel has been covered else where consider locking/updating it stating as such. – danny Apr 27 '17 at 16:24
  • See last line in answer. – danny Apr 27 '17 at 16:29
  • I agree with the conclusion of the reviewers. The answer is valid as it stands and your answer contains no additional relevant information that is not already present in the above answer. – danny May 02 '17 at 12:43
1

This is more a response to your question edit than a complete answer - I agree with the gist of Danny's answer, that you need to implement this in a class with a __next__/next method (depending on the version of Python). In your edit you assert that it must be possible because Cython can do it. I thought it was worth having a look at exactly how Cython does it.

Start with a basic example (picked because it has a few different yield statements and a loop):

def basic_iter(n):
    a = 0
    b = 5
    yield a
    a+=3
    yield b
    b+=2

    for i in range(n):
        yield a+b+n
        a = b+1
        b*=2
    yield 50

The first thing Cython does is define a __pyx_CoroutineObject C class with a __Pyx_Generator_Next method that implements __next__/next. A few relevant attributes of the __pyx_CoroutineObject:

  • body - a C function pointer that implements the logic you've defined.
  • resume_label - an integer used to remember how far you've got in the function defined by body
  • closure - a custom created C class that stores all the variables used within body.

In a slightly roundabout way, __Pyx_Generator_Next calls the body attribute, which is the translation of the Python code you've defined.

Let's then look at how the function assigned to body works - in the case of my example called __pyx_gb_5iters_2generator. The first thing it does is use resume_label to jump to the right yield statement:

switch (__pyx_generator->resume_label) {
    case 0: goto __pyx_L3_first_run;
    case 1: goto __pyx_L4_resume_from_yield;
    case 2: goto __pyx_L5_resume_from_yield;
    case 3: goto __pyx_L8_resume_from_yield;
    case 4: goto __pyx_L9_resume_from_yield;
    default: /* CPython raises the right error here */
    __Pyx_RefNannyFinishContext();
    return NULL;
  }

Any variable assignments are done through the closure structure (locally named to __pyx_cur_scope:

/*     a = 0             # <<<<<<<<<<<<<< */
__pyx_cur_scope->__pyx_v_a = __pyx_int_0

yield sets the resume_label and returns (with resume_label allowing you to jump straight back next time):

__pyx_generator->resume_label = 1;
return __pyx_r;

The loop is slightly more complicated but basically the same thing - it uses goto to jump into the C loop (which is legal).

Finally, once it's reached the end it raises a StopIteration error:

PyErr_SetNone(PyExc_StopIteration);

In summary, Cython does exactly what you've been advised to do: it defines a class with __next__ or next method and uses that class to keep track of the state. Because it's automated, it's quite good at keeping track of reference counting and thus avoids the Segmentation Fault errors you experienced. The use of goto to return to the previous point of execution is efficient, but requires care.

I can see why rewriting your generator function in C in terms of a single __next__/next function is unappealing, and Cython clearly offers a straightforward way of doing it without writing C yourself, but it doesn't use any special techniques to do the translation on top of what you've already been told.

DavidW
  • 29,336
  • 6
  • 55
  • 86
  • You write: `Cython clearly offers a straightforward way of doing it without writing C yourself, but it doesn't use any special techniques to do the translation on top of what you've already been told.` and at the same time (if I have understood your answer right), explain the special technique Cython uses to translate Python `yield` to an appropriate C-equivalent on top what I have been told: using a switch to `goto` statements where the state of the iteration is kept in a special `closure` structure. – Claudio May 30 '17 at 00:52
  • 1
    You do understand it right. I think my point was that it's nothing you couldn't write yourself. I suspect if you gave the problem of writing a function that can be called multiple times to work like a Python generator to a C programmer, they'd probably come up with something similar (maybe avoiding `goto`...). Anyway - I mostly thought it's worth looking at how Cython does it. It'll either show a useful mechanism to copy, or convince you that you'd rather not write it yourself and you should just use Cython (both of which are ok outcomes) – DavidW May 30 '17 at 06:34
  • In this context it would be sure interesting to know how the Python interpreter does it in it's own C-code. I wouldn't be surprised if Cython just copy how Python does it itself. – Claudio May 30 '17 at 11:20
  • 1
    The best answer I can see is https://stackoverflow.com/questions/24501406/how-do-yield-works-in-python-c-code-good-bad-part/24524417#24524417. I think that most of the information is held in the frame object and there isn't a huge difference between how functions and generators are handled (Python still needs to store the current line number and the local variables for either of them, so it isn't too hard to be able to resume from a point - this is also used in the debugger for example). – DavidW May 30 '17 at 11:45