It seems putting an array of strings into an numpy array takes over 20 times more memory than the raw array. I could understand that it would take 10% more memory due to some overhead, but I would like to know why it takes 2000% percent more.
import numpy as np
from sys import getsizeof
txt = ["adsfjwofj owejifowijefiwjfoi of wofjwoijfwoijfoiwej"]
print(getsizeof(txt))
txts = [txt for _ in range(10000)]
print(getsizeof(txts))
txts_np = np.array(txts)
print(getsizeof(txts_np))
The output:
72
87624
2040112
I thought there was something wrong with my installation, but I tried it also on another machine with a different numpy version and got the same result.