0

I have a function to calculate the average vector for each name which is made of many words, this function is returning numpy.ndarray with shape of (100,). The resulting vector is as the following:

[ 0.00127441  0.0002633   0.00039622  0.00055501  0.00070984 -0.00089766
 -0.00073814 -0.00224919  0.00233035 -0.00037628  0.00125402 -0.00052623
  0.00114087 -0.00070441 -0.00419099  0.00031204 -0.0002703  -0.00290918
  ...(13 lines)
0.00260704 -0.00000406 -0.00160876  0.00134342]

As upon receiving the numpy array, I am removing line breaks as follows:

temp = ["%.8f" % number for number in name_avg_vector]
temp=re.sub('\s+', ' ', temp)
name_avg_vector= np.array(list(temp))

but I am getting the following error:

---> 79     temp=re.sub('\s+', ' ', name_avg_vector)
TypeError: cannot use a string pattern on a bytes-like object

I also tried changing the printoptions, but I continue having the break line in the file storing the numpy array values:

import sys
np.set_printoptions(threshold=sys.maxsize)
np.set_printoptions(threshold=np.inf)

After, I tried with array_repr to remove the break line:

name_avg_vector = np.array_repr(name_avg_vector).replace('\n', '')

but it saves as:

['array([-0.00849786,  0.00113221, -0.00643946,  0.00437448, -0.00740928,        0.00381133,  0.00178376, -0.00065115, -0.00050142,       -0.0001178 ,  0.00029183,  0.00015484, -0.00001569,  0.0006973 ,        0.00051486,  0.00006652, -0.00099618, -0.00049231,  0.0003479 ,        0.00135821,  0.00078396,  0.00038927,  0.00040825, -0.00093267,        0.00025755, -0.00012063, -0.00074733,  0.00120466,  0.00041425,       -0.00062592,  0.00098112,  0.00101578, -0.00048335,  0.00079251,       -0.00112981, 
...
-0.00050014,  0.00133685, -0.00020537, -0.00082505])']  

As stated by Anoyz in here, converting to list gets rid of break lines such as name_avg_vector.tolist().

Thanks

John Barton
  • 1,581
  • 4
  • 25
  • 51
  • What line breaks are you removing? Where do you see these? Your numpy array doesn't actually contain any line breaks. Numpy only generates the line breaks when you display the array. – Code-Apprentice Oct 10 '19 at 21:36
  • For instance, the first array content posted includes: `0.00127441 0.0002633 0.00039622 0.00055501 0.00070984 -0.00089766` where after -0.00089766 there is a `\n` to split the line, every 6 float numbers, the array is broken to the next line. I read that linewidth=75 by default, The shape of this array is (100,) – John Barton Oct 10 '19 at 21:41
  • "where after -0.00089766 there is a \n to split the line" So there are linebreaks when you **display** the array with something like `print(name_avg_vector)`. This isn't data stored in the array. – Code-Apprentice Oct 10 '19 at 21:42
  • I though it was the data itself because it was stored with break lines in the file. Later when I applied `np.array_repr()` the break lines were gone, but the legend `'array(..` was added. – John Barton Oct 10 '19 at 21:46
  • 1
    How are you 'receiving' and processing this 'array'? Sounds like you are trying to work with the string representation of the array, rather than the array itself. It is hard to recreate an array from its print string - with those line breaks, spaces and ellipses. You should try to work with the array object itself. If you need to save it to a file, use `np.save` and `np.load` to retrieve it. Of `savetxt` if it is 2d and you want a text `csv` style file. – hpaulj Oct 11 '19 at 04:00
  • In all programming languages, it is crucial to understand the difference between a **value** and a **representation** of that value. Everything you have shown so far seems like you are manipulating the string representation of an array. Instead, you should work with the array directly. If you want the array to display in a certain way, there are other techniques to do so, including writing your own for loop to do it if necessary. – Code-Apprentice Oct 11 '19 at 15:26

1 Answers1

0

Your numpy array appears to have dtype float so it doesn't actually contain any new lines. I assume what you are seeing are linebreaks when you do something like print(name_avg_vector). One way to solve the problem is to write your own loop to print the values in the format you want.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268