I have a problem when trying to modify a list in Python. The list is a bit complex, but have the dimensions: (sequences, length of sequence, onehot-encoding). So, in the example below, the list "lst" contains 3 sequences with different lengths of 4, 3 and 5. I want to modify the list, so all sequences becomes equal length. I simple have to add [0,0] to all sequences, so it becomes max length (5 in this case).
Problem: The following function does the correct thing. Although, for some reason it changes the orignal list (lst). So after running it, new_lst == lst. It happens in the "Add padding to all sequences"-section. Although, i don't know why. Why is this? And how can I change it?
lst = [[[0,1],[0,1],[0,1],[0,1]],
[[0,1],[0,1],[0,1]],
[[0,1],[0,1],[0,1],[0,1],[0,1]]]
def pad_sequence(list_to_pad):
padded_lst = list_to_pad.copy()
# Define length of alphabet (len_alphabet: Integer)
len_alphabet = len(padded_lst[0][0])
# Find max length of a seq (max_len: Integer)
max_len = 0
for seq in padded_lst:
if len(seq) > max_len:
max_len = len(seq)
# Define vector to append (pad: list)
pad = [0 for _ in range(len_alphabet)]
# Add padding to all sequences
for idx, seq in enumerate(padded_lst):
length = max_len - len(seq)
for _ in range(length):
padded_lst[idx].append(pad)
return padded_lst
new_lst = pad_sequence(lst)
print(lst)
print(new_lst)
UPDATE: Apparently using deepcopy does the trick. The correct function is instead:
def pad_sequence(list_to_pad):
from copy import deepcopy
# Deepcopy list
padded_lst = deepcopy(list_to_pad)
# Define length of alphabet (len_alphabet: Integer)
len_alphabet = len(padded_lst[0][0])
# Find max length of a seq (max_len: Integer)
max_len = 0
for seq in padded_lst:
if len(seq) > max_len:
max_len = len(seq)
# Define vector to append (pad: list)
pad = [0 for _ in range(len_alphabet)]
# Add padding to all sequences
for idx, seq in enumerate(padded_lst):
length = max_len - len(seq)
for _ in range(length):
padded_lst[idx].append(pad)
return padded_lst