0

I have a standard list of numpy arrays because each item doesn't have an equal amount of dimensions, I convert it to uint8 (huge amounts of hex values), then perform operations to normalise the dimensions of each numpy array, so I can then convert it from a list to multi dimension numpy array.

To achieve this I need to use a few methods to make the code readable, but when I pass one numpy array from the list into a method I also have to pass the entire list and the iterator:

    def push_bytes(messages, message, start_byte, i):
        messages[i] = np.insert(messages[1], start_byte, 0, 0)

I call this method many times so would like to not have to pass the entire list and iterator so I can do something like this:

    def push_bytes(message, start_byte):
        message = np.insert(message, start_byte, 0, 0)

I believe the reason this doesn't work is because message = is creating a new numpy array and not pointing to the original one, is there a way I can point to the original one without having to pass the entire list and an iterator?

Sample data:

messages = [
    [  5   1   0   0   0  47  69 222  10 221 242 132   0   0   0  79   0   0  ]
    [  5   1   0   0   0  27  68 222  10  86   7 133  95 126 220  38   0  ]
    [  5   1   0   0  45  48   0   0   7  10  86   7 133  95 126 220  30   0   0   0  79   0   0  ]
    [  5   1   0   0   0  47  69 222  10 129  10 133  95 126 220   5   0   0   0  75   0   0  ]
    [  5   1   0   0  17  39   0   0 112  66 222  10 129  10 133  ]
    [  5   1   0   0   7  69 222  10 138   0   0  55   0   0   0  79   0   0  ]
    [  5 222  10 138  10 133  95 126   0   0  24   0   0   0  79   0   0  ]
    [ 17  39   0   0 232  66 222  10 138  10 133   0   0   0   0   0  93   0   0 ]
]
Jack Clayton
  • 448
  • 4
  • 9
  • 2
    Can you provide any sample data? Also, what's `count` for? – Mark Moretto Oct 31 '20 at 01:50
  • `.insert` creates a new array. -[https://numpy.org/doc/stable/reference/generated/numpy.insert.html](https://numpy.org/doc/stable/reference/generated/numpy.insert.html) – wwii Oct 31 '20 at 01:50
  • There's no reason for `push_bytes` to take responsibility for modifying the original list. Actually, there's probably no reason to modify the original list at all. Something like `new_list = [push_bytes(...) for ... in ...]` would make more sense, with `push_bytes` returning the new array. – user2357112 Oct 31 '20 at 01:55
  • 1
    @MarkMoretto I edited the question and removed count, the actual method I use is much more complicated and inserts a given amount of bytes at a certain location, which uses a loop to do it. I'll add sample data yes – Jack Clayton Oct 31 '20 at 01:59
  • @user2357112supportsMonica I would do that but it's huge amounts of data that I'm processing on multiple cores, doubling up on the list uses too much memory unfortunately. – Jack Clayton Oct 31 '20 at 02:01
  • @JackClayton: In that case, it's still probably better to take the list modification out of `push_bytes`. You can do that on the caller's end. – user2357112 Oct 31 '20 at 02:03
  • (Unrelated: "iterator" in Python refers to an object that implements the [iterator protocol](https://docs.python.org/3/library/stdtypes.html#iterator-types), while you seem to be referring to a simple integer index.) – user2357112 Oct 31 '20 at 02:05
  • @user2357112supportsMonica The problem with this is that I call push_bytes about 30 times, it's a complicated process of inserting a byte based on output from another byte, then pulling another byte after the size is right. Rewriting the code 30 times makes it unreadable for someone who doesn't know what I'm doing. – Jack Clayton Oct 31 '20 at 02:06

1 Answers1

1

A function like:

def push_bytes(messages, i, start_byte):
     messages[i] = np.insert(messages[i], start_byte, 0, 0)

will modify (replace actually) the i'th element of the messages list. What's being passed is a reference to the list. There's no copying or other expensive stuff.

The np.insert function does create a new a new array, and a reference to that is placed in the messages list.

In:

def push_bytes(message, start_byte):
    message = np.insert(message, start_byte, 0, 0)

the message=... assigns the new array to the message variable, but it is a local, and does not modify the variable outside the function. You have to add a

    return message

and do

messages[i] = push_bytes(messages[i], start_byte)

to modify the element of the messages list.

I think the two functions will have similar execution times, since they just pass references, and do not require different calculations or copies. The second is, for most purposes, cleaner, since it doesn't assume anything about messages; it just does the new array creation (I assume there's more to this than a simple call to np.insert).

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks so much for this, yes this is much cleaner. Good to hear that it is just passing references and not copying, I'm new to python and wasn't sure of this. I was hoping there was something inbuilt to python or numpy where you can pass the actual address and dereference like C++. I love python so far but I guess this type of thing is the downside of the language. – Jack Clayton Oct 31 '20 at 02:31
  • I found another question which shows a way to get around this: https://stackoverflow.com/questions/986006/how-do-i-pass-a-variable-by-reference but the conclusion is that it's too cumbersome and that returning the value is better. It also has a lot more information in different answers about what is happening under the hood. – Jack Clayton Oct 31 '20 at 02:51