0

I have a problem when trying to modify a list in Python. The list is a bit complex, but have the dimensions: (sequences, length of sequence, onehot-encoding). So, in the example below, the list "lst" contains 3 sequences with different lengths of 4, 3 and 5. I want to modify the list, so all sequences becomes equal length. I simple have to add [0,0] to all sequences, so it becomes max length (5 in this case).

Problem: The following function does the correct thing. Although, for some reason it changes the orignal list (lst). So after running it, new_lst == lst. It happens in the "Add padding to all sequences"-section. Although, i don't know why. Why is this? And how can I change it?

lst = [[[0,1],[0,1],[0,1],[0,1]],
        [[0,1],[0,1],[0,1]],
        [[0,1],[0,1],[0,1],[0,1],[0,1]]]

def pad_sequence(list_to_pad):
    padded_lst = list_to_pad.copy()

    # Define length of alphabet (len_alphabet: Integer)
    len_alphabet = len(padded_lst[0][0])

    # Find max length of a seq (max_len: Integer)
    max_len = 0 
    for seq in padded_lst:
        if len(seq) > max_len:
            max_len = len(seq)

    # Define vector to append (pad: list)
    pad = [0 for _ in range(len_alphabet)]

    # Add padding to all sequences
    for idx, seq in enumerate(padded_lst):
        length = max_len - len(seq)
        for _ in range(length):
            padded_lst[idx].append(pad)

    return padded_lst

new_lst = pad_sequence(lst) 

print(lst)
print(new_lst)

UPDATE: Apparently using deepcopy does the trick. The correct function is instead:

def pad_sequence(list_to_pad):
    from copy import deepcopy

    # Deepcopy list
    padded_lst = deepcopy(list_to_pad)

    # Define length of alphabet (len_alphabet: Integer)
    len_alphabet = len(padded_lst[0][0])

    # Find max length of a seq (max_len: Integer)
    max_len = 0 
    for seq in padded_lst:
        if len(seq) > max_len:
            max_len = len(seq)

    # Define vector to append (pad: list)
    pad = [0 for _ in range(len_alphabet)]

    # Add padding to all sequences
    for idx, seq in enumerate(padded_lst):
        length = max_len - len(seq)
        for _ in range(length):
            padded_lst[idx].append(pad)

    return padded_lst
  • 1
    I think you need copy.deepcopy – quamrana Apr 19 '20 at 13:49
  • Mandatory link to [Ned Batchelder](https://nedbatchelder.com/text/names.html) names. – quamrana Apr 19 '20 at 13:51
  • For more info on the difference between shallow and deep copy: https://docs.python.org/2/library/copy.html – Leafar Apr 19 '20 at 13:51
  • Using deepcopy does the trick! Thank you :) Didn't know that was a thing, but i will look into it. Thank you! @quamrana – Mathias Byskov Apr 19 '20 at 13:57
  • Can someone please explain why? this doesn't make any sense. He's operating on a new variable why would the new variable change the original one. It doesn't make any sense. – Thaer A Apr 19 '20 at 14:00

0 Answers0