3

I am using Protobuf (v3.5.1) in a Python project I'm working on. My situation can be simplified to the following:

// Proto file

syntax = "proto3";

message Foo {
    Bar bar = 1;
}

message Bar {
    bytes lotta_bytes_here = 1;
}

# Python excerpt
def MakeFooUsingBar(bar):
    foo = Foo()
    foo.bar.CopyFrom(bar)

I am worried about the memory performance of .CopyFrom() (If I am correct, it is copying contents, instead of the reference). Now, in C++, I could use something like:

Foo foo;
Bar* bar = new Bar();
bar->set_lotta_bytes_here("abcd");
foo.set_allocated_bar(bar);

Which looks like it does not need to copy anything judging by the generated source:

inline void Foo::set_allocated_bar(::Bar* bar) {
  ::google::protobuf::Arena* message_arena = GetArenaNoVirtual();
  if (message_arena == NULL) {
    delete bar_;
  }
  if (bar) {
    ::google::protobuf::Arena* submessage_arena = NULL;
    if (message_arena != submessage_arena) {
      bar = ::google::protobuf::internal::GetOwnedMessage(
          message_arena, bar, submessage_arena);
    }

  } else {

  }
  bar_ = bar;
  // @@protoc_insertion_point(field_set_allocated:Foo.bar)
}

Is there something similar available in Python? I have looked through the Python generated sources, but found nothing applicable.

Michał
  • 2,202
  • 2
  • 17
  • 33
  • You're just *worried* about the performance? Did you measure it to see whether it will be a problem for your application? – Greg Hewgill Feb 23 '18 at 23:03
  • @GregHewgill I am aware that while I am copying the resource, two instances exist in the memory. My application uses large resources (in tens or hundreds of megabytes), and I want to avoid the overhead. Especially since my intention is not to copy the resource, but simply move it. I understand this can be looked at as premature optimization, but if there is a built-in functionality I could use, I don't see a reason not use it. – Michał Feb 23 '18 at 23:09
  • Oh, well if you're copying hundreds of megabytes of stuff, then sure, this is worth investigating the performance aspects. :) – Greg Hewgill Feb 23 '18 at 23:40

1 Answers1

5

When it comes to large string or bytes objects, it seems that Protobuf figures the situation fairly well. The following passes, which means that while a new Bar object is created, the binary array is copied by reference (Python bytes are immutable, so it makes sense):

def test_copy_from_with_large_bytes_field(self):
    bar = Bar()
    bar.val = b'12345'
    foo = Foo()
    foo.bar.CopyFrom(bar)

    self.assertIsNot(bar, foo.bar)
    self.assertIs(bar.val, foo.bar.val)

This solves my issue of large bytes object. However, if someone's problem lies in nested, or repeated fields, this will not help - such fields are copied field by field. It does make sense - if one copies a message, they want the two to be independent. If they were not, making changes to the original message would modify the copied (and vice versa).

If there is anything akin to the C++ move semantics (https://github.com/google/protobuf/issues/2791) or set_allocated_...() in Python protobuf, that would solve it, however I am not aware of such a feature.

Michał
  • 2,202
  • 2
  • 17
  • 33