0

Background

I am writing an algorithm of encryption, which will probably recieve a huge-size string as an argument. I need to make a block cipher.

Question Actual

To avoid copying the string or create similar object that takes a lot of memory, I want to read the string in a byte manner (each time, read one byte, or one bit), and encrypt the block of data. How can I do that with minimal usage of memory ?

Approach Tried

I have try memoryview but it only allows bytes, not str.

str::encode looks like it will create a new copy of string in bytes.

The mmap seems to be useful, but I do not sure whether it will create new object, or how can I iterate that result.

  • what is the source of this string? Why can't it be a `bytes` object? – juanpa.arrivillaga Apr 24 '23 at 04:45
  • This function might be used by a user to encrypt a string message. For users, I do not think they will use byte string, but a trival `str` string. If user want to encrypt a file, I would get a bytes object. But just for everyday string, I think I might cannot. – Ozelot Vanilla Apr 24 '23 at 04:59
  • But your question doesn't make sense unless you are talking about a bytes object. Strings *don't have bytes*, you cannot read a string "a byte at a time". `str` objects are sequences of unicode code points. Any given unicode code point can correspond to various single- or multibyte encodings. Consider, for example, code point 163: `s = '£'`, (i.e. `s = chr(180)`), look what happens if you do `s.encode('utf8')` vs `s.encode('latin')` – juanpa.arrivillaga Apr 24 '23 at 05:23
  • if your user wants to *encrypt something*, then it is their responsibility to provide the `bytes` object (or perhaps, `bytearray`) – juanpa.arrivillaga Apr 24 '23 at 05:25
  • Thank you. I think I got your idea: "str does not save bytes internally, instead it saves the code point of Unicode", am I right ? But for me, I believe that API should also be convenient for the user, and if possible, take less memory if user passed you an unwated typed, and you need to convert it. So maybe the question is converted to: whether there could be an approach that allows us to **get the internal representation of string** (the buffer as mentioned [here](https://stackoverflow.com/a/1838733/20445940)). Additionally, the decryption will also done in Python. – Ozelot Vanilla Apr 24 '23 at 07:29
  • but you *shouldn't mess with the internal representation*, that is an **implementation detail** that *can and has changed within major versions* – juanpa.arrivillaga Apr 24 '23 at 23:38

0 Answers0