I am still new to Numpy and was messing around the Numpy's dtypes and found that the dtype that is specific for strings, aka 'U', uses up more memory space than the object type. The code that illustrates this fact is down below:
size= 100000
half_size = size//2
ind1 = np.arange(half_size)*2+1
ind2 = np.arange(half_size)*2
X = np.empty(size, dtype = 'object')
X[ind1] = 'smile'
X[ind2] = 'smile2'
W = np.empty(size, dtype = 'U6')
W[ind1] = 'smile'
W[ind2] = 'smile2'
print(X.nbytes)
print(W.nbytes)
The result is the following:
800000
2400000
My questions are the following:
1) Why does this happen? Why does dtype = 'U6' takes up 3 times as much of memory as dtype = object
2) Is there a way to create a string numpy array that takes up less memory space than the dtype = object?
Thank you in advance
EDIT: I'd like to explain that my post is not the duplicate of another post, because my post is about memory usage, and the other post does not mention anything about the memory usage regarding dtype = 'U' vs dtype = 'object'
EDIT2: Although I have already learnt something new fromanother post, unfortunately the other post does not answer my question, because my post is about memory usage, and the other post does not mention anything about the memory usage regarding dtype = 'U' vs dtype = 'object'