4

I have python wrappers for a C library that performs encoding / decoding of binary data. One of the wrapper functions takes a python list as an argument and returns a processed python list.

Currently I'm doing it like this:

list pycf_cobs_decode(list data) {
    list decoded;
    uint32_t data_len = 0, worst_data_len = 0;
    uint8_t *cdata = NULL;
    uint8_t *cdata_dec = NULL;

    data_len = len( data );
    worst_data_len = COBS_WORST_LEN( data_len );

    // Clone the python list
    cdata = new uint8_t[ data_len ];
    for (int i=0; i<data_len; i++)
        cdata[ i ] = extract< uint8_t >( data[ i ] );

    // Decode it
    cdata_dec = new uint8_t[ worst_data_len ];
    cobs_decode( cdata, data_len, cdata_dec );

    for (int i=0; i<worst_data_len; i++)
        decoded.append( cdata_dec[ i ] );

    delete[] cdata_dec;
    delete[] cdata;
    return decoded;
}

However, by creating an empty list and then appending all the bytes one by one is far from being efficient (causes a lot of realloc calls).

  1. Is there a way to initialize boost::python::list to a specific number of elements?
  2. One idea was to create an std::vector and cast it to python list (std::vector to boost::python::list), but found that most people suggested manually appending all the elements anyway.
  3. Another idea was to copy the input list, overwrite the contents and only append additional bytes. But so far I haven't found any functions to do that, either.
  4. Perhaps if I would copy the input list, delete all the elements (hoping that it doesn't realloc when elements are removed) and append them back?

I have been trying to find something useful from these pages, but I don't really know what to look for:

http://www.boost.org/doc/libs/1_57_0/libs/python/doc/v2/reference.html

http://www.boost.org/doc/libs/1_57_0/libs/python/doc/v2/list.html

Thank you

Edit: Realized that std::string supports binary arrays also and boost::python automatically converts it to python string. With std::string, the code is now shorter and hopefully yields better performance as well. However, this does not answer the question as it only works for an array of characters.

std::string pycf_cobs_decode(const std::string &data) {
    uint32_t worst_len = COBS_WORST_LEN( data.size() );
    char *cdec = new char[ worst_len ];

    // Decode it
    cobs_decode( ( const uint8_t * ) data.c_str(), data.size(), 
        ( uint8_t * ) cdec );

    std::string decoded( cdec, worst_len );
    delete[] cdec;

    return decoded;
}
Community
  • 1
  • 1
Sussch
  • 1,117
  • 1
  • 9
  • 15
  • It would also be possible to compile the library into a .dll / .so, used ctypes.cdll.LoadLibrary() and access C arrays directly. Another possibility would be to implement a custom class with interface functions that behave as if it were a python list: https://wiki.python.org/moin/boost.python/StlContainers – Sussch Feb 08 '15 at 07:24

1 Answers1

4

There may not be as many internal allocations as one would expect when appending an item at a time. It is documented that for list.append(), the average and amortized worst case complexity is constant.

Boost.Python does a fairly good job at allowing Python-ish code to be written in C++. As such, one could use the Python idiom, [None] * n for allocating a list to a given size:

/// @brief Construct list with `n` elements.  each element is a copy 
///        of `value`.
/// @param n Iniitail container size.
/// @param item Item with which to fill the container.
boost::python::list make_list(
  const std::size_t n,
  boost::python::object item = boost::python::object() /* none */)
{
  namespace python = boost::python;
  // >>> [None] * n;
  python::list result;
  result.append(item);
  result *= n;
  return result;
}

I cannot find documentation that explicitly states the implementation behavior of such operation, but the idiom often out performs over append. Nevertheless, while Boost.Python does not expose all of the Python/C API capabilities, one could guarantee a single allocation by using PyList_New(len) to create the list, then manage and manipulate it via Boost.Python. As noted in the documentation, when len is greater than zero, all items must be set to real objects via PyList_SetItem() before exposing the list object to Python code, or in this case, a boost::python::list object:

If len is greater than zero, the returned list object’s items are set to NULL. Thus you cannot use abstract API functions such as PySequence_SetItem() or expose the object to Python code before setting all items to a real object with PyList_SetItem().

Auxiliary functions may help hide these details. For instance, one could use a factory function with fill-like behavior:

/// @brief Construct list with `n` elements.  each element is a copy 
///        of `value`.
/// @param n Iniitail container size.
/// @param item Item with which to fill the container.
boost::python::list make_list(
  const std::size_t n,
  boost::python::object item = boost::python::object() /* none */)
{
  namespace python = boost::python;
  python::handle<> list_handle(PyList_New(n));
  for (std::size_t i=0; i < n; ++i)
  {
    // PyList_SetItem will steal the item reference.  As Boost.Python is
    // managing the item, explicitly increment the item's reference count
    // so that the stolen reference remains alive when this Boost.Python
    // object's scope ends.
    Py_XINCREF(item.ptr());

    PyList_SetItem(list_handle.get(), i, item.ptr());
  }
  return python::list{list_handle};
}

Or a factory function that works on a range:

/// @brief Construct a list from a given range of [first,last).  The
///        copied range includes all elements between `first` to `last`,
///        including `first`.
/// @param first Input iterator to the initial item to copy.
/// @param last Input iterator to one after the final item to be copied.
template <typename Iterator>
boost::python::list make_list(Iterator first, Iterator last)
{
  namespace python = boost::python;
  const auto size = std::distance(first, last);
  python::handle<> list_handle{PyList_New(size)};
  for (auto i=0; i < size; ++i, ++first)
  {
    python::object item{*first};

    // PyList_SetItem will steal the item reference.  As Boost.Python is
    // managing the item, explicitly increment the item's reference count
    // so that the stolen reference remains alive when this Boost.Python
    // object's scope ends.
    Py_XINCREF(item.ptr());

    PyList_SetItem(list_handle.get(), i, item.ptr());
  }
  return boost::python::list{list_handle};
}

Here is a complete example demonstrating with mocked up COBS_WORST_LEN() and cobs_decode() functions that decode by accumulating pairs. As the decoded values are known when the list being returned is constructed, I have opted to use a range factory function to prevent having to iterate over the list and setting the values twice:

#include <boost/python.hpp>

#include <iostream>
#include <vector>

#include <boost/python/stl_iterator.hpp>

/// Mockup that halves the list, rounding up.
std::size_t COBS_WORST_LEN(const std::size_t len)
{
  return (len / 2) + (len % 2);
}

/// Mockup that just adds adjacent elements together.
void cobs_decode(
  const boost::uint8_t* input,
  const std::size_t size,
  boost::uint8_t* output)
{
  for (std::size_t i=0; i < size; ++i, ++input)
  {
    if (i % 2 == 0)
    {
      *output = *input;
    }
    else
    {
      *output += *input;
      ++output;
    }
  }
}

/// @brief Construct a list from a given range of [first,last).  The
///        copied range includes all elements between `first` to `last`,
///        including `first`.
/// @param first Input iterator to the initial value to copy.
/// @param last Input iterator to one after the final element to be copied.
template <typename Iterator>
boost::python::list make_list(Iterator first, Iterator last)
{
  namespace python = boost::python;
  const auto size = std::distance(first, last);
  python::handle<> list_handle{PyList_New(size)};
  for (auto i=0; i < size; ++i, ++first)
  {
    python::object item{*first};

    // PyList_SetItem will steal the item reference.  As Boost.Python is
    // managing the item, explicitly increment the item's reference count
    // so that the stolen reference remains alive when this Boost.Python
    // object's scope ends.
    Py_XINCREF(item.ptr());

    PyList_SetItem(list_handle.get(), i, item.ptr());
  }
  return boost::python::list{list_handle};
}

/// @brief Decode a list, returning the aggregation of paired items.
boost::python::list decode(boost::python::list py_data)
{
  namespace python = boost::python;

  // Clone the list.
  std::vector<boost::uint8_t> data(len(py_data));
  std::copy(
    python::stl_input_iterator<boost::uint8_t>{py_data},
    python::stl_input_iterator<boost::uint8_t>{},
    begin(data));

  // Decode the list.
  std::vector<boost::uint8_t> decoded(COBS_WORST_LEN(data.size()));
  cobs_decode(&data[0], data.size(), &decoded[0]);

  return make_list(begin(decoded), end(decoded));
}

BOOST_PYTHON_MODULE(example)
{
  namespace python = boost::python;
  python::def("decode", &decode);
}

Interactive usage:

>>> import example
>>> assert(example.decode([1,2,3,4]) == [3,7])

Also, as Boost.Python can throw exceptions, it may be worth considering using memory-managed containers, such as std::vector, or smart pointers, rather than the raw dynamic arrays.

Tanner Sansbury
  • 51,153
  • 9
  • 112
  • 169
  • 1
    I am getting a segfault when saving non-trivial objects to the list. I believe that `python::object(*first).ptr()` causes the object to be destroyed immediately. Fixing by `python::object val{*first}; Py_INCREF(val.ptr()); PyList_SetItem(handle.get(), i, val.ptr());` seems to work. – Floop Aug 31 '17 at 20:27
  • @Floop, great catch! The `PyList_SetItem()` function steals the item reference, requiring one to explicitly increment the reference as the Boost.Python object will also decrement the count. I have updated the above code and provided a link to a working example. – Tanner Sansbury Sep 01 '17 at 17:00