18

I tried to create an array from a text file. I saw earlier that numpy had a method loadtxt, so I try it, but it add some junk character before each row...

# my txt file

    .--``--.
.--`        `--.
|              |
|              |
`--.        .--`
    `--..--`

# my python v3.4 program

import numpy as np
f = open('tile', 'r')
a = np.loadtxt(f, dtype=str, delimiter='\n')
print(a)

# my print output

["b'    .--``--.    '"
 "b'.--`        `--.'"
 "b'|              |'"
 "b'|              |'"
 "b'`--.        .--`'"
 "b'    `--..--`    '"]

What are these 'b' and double quotes ? And where do they come from ? I tried some solution picked from internet, like open the file with codecs, change the dtype by 'S20', 'S11', and a lot of other things which don't work... What I expect is an array of unicode strings which look like this :

[['    .--``--.    ']
 ['.--`        `--.']
 ['|              |']
 ['|              |']
 ['`--.        .--`']
 ['    `--..--`    ']]

Info: I'm using python 3.4 and numpy from the debian stable repository

hpaulj
  • 221,503
  • 14
  • 230
  • 353
krshk
  • 353
  • 1
  • 2
  • 8
  • I already check this question before, and, OK!, b is for byte, but why is it _inside_ a double quote string ? I have to re-use this array later for replace some characters in another array, according to an index. So if I have a b and 2 extra simple quotes, it will break the later program. – krshk Nov 11 '15 at 17:16
  • Why are you using `loadtxt` to load a file like that? `loadtxt` is designed for columns of data separated by commas or some other delimiter. You could just as easily read that file with pure python; e.g. something like `with open('tile') as f: a = [line.strip('\n') for line in f.readlines() if not line.startswith('#')]` – Warren Weckesser Nov 11 '15 at 17:56
  • Simple : I'm a total beginner in Python. I began to use it for 1 week. I more accurate in web language. So I appears to me a bit confusing :) – krshk Nov 11 '15 at 18:18

5 Answers5

17

np.loadtxt and np.genfromtxt operate in byte mode, which is the default string type in Python 2. But Python 3 uses unicode, and marks bytestrings with this b.

I tried some variations, in an python3 ipython session:

In [508]: np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[508]: b'    .--``--.'
In [509]: np.loadtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[509]: "b'    .--``--.'"
...
In [511]: np.genfromtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[511]: '.--``--.'
In [512]: np.genfromtxt('stack33655641.txt',dtype=None,delimiter='\n')[0]
Out[512]: b'.--``--.'
In [513]: np.genfromtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[513]: b'.--``--.'

genfromtxt with dtype=str gives the cleanest display - except it strips blanks. I may have to use a converter to turn that off. These functions are meant to read csv data where (white)spaces are separators, not part of the data.

loadtxt and genfromtxt are over kill for simple text like this. A plain file read does nicely:

In [527]: with open('stack33655641.txt') as f:a=f.read()
In [528]: print(a)
    .--``--.
.--`        `--.
|              |
|              |
`--.        .--`
    `--..--`

In [530]: a=a.splitlines()
In [531]: a
Out[531]: 
['    .--``--.',
 '.--`        `--.',
 '|              |',
 '|              |',
 '`--.        .--`',
 '    `--..--`']

(my text editor is set to strip trailing blanks, hence the ragged lines).


@DSM's suggestion:

In [556]: a=np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n').astype(str)
In [557]: a
Out[557]: 
array(['    .--``--.', '.--`        `--.', '|              |',
       '|              |', '`--.        .--`', '    `--..--`'], 
      dtype='<U16')
In [558]: a.tolist()
Out[558]: 
['    .--``--.',
 '.--`        `--.',
 '|              |',
 '|              |',
 '`--.        .--`',
 '    `--..--`']
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 2
    I think `np.loadtxt("tile", dtype=bytes, delimiter="\n").astype(str)` might work, but I agree completely with the overkill point. – DSM Nov 11 '15 at 17:41
9

Python3 is working with Unicode. I had the same issue when using loadtxt with dtype='S'. But using dtype='U as Unicode string in both numpy.loadtxt or numpy.genfromtxt, it will give output without b

a=numpy.loadtxt('filename',dtype={'names':('col1','col2','col3'),'formats':('U10','U10','i4')},delimiter=',')

print(a)
Benkerroum Mohamed
  • 1,867
  • 3
  • 13
  • 19
Kamalesh
  • 91
  • 1
  • 2
7

You can use np.genfromtxt('your-file', dtype='U').

MojiProg
  • 1,992
  • 1
  • 16
  • 8
  • 1
    Do not forget to specify a string length`dtype='U10'` as without the 10 you'll just get an empty string. – Hami Feb 14 '18 at 13:36
2

This is probably not the most 'pythonic' or best solution, but definitely gets the job done using numpy.loadtxt in python3. I am aware that it is a "dirty" solution, but it works for me.

import numpy as np
def loadstr(filename):
    dat = np.loadtxt(filename, dtype=str)
    for i in range(0,np.size(dat[:,0])):
        for j in range(0,np.size(dat[0,:])):
            mystring = dat[i,j]
            tick = len(mystring) - 1 
            dat[i,j] = mystring[2:tick]

    return (dat)

data = loadstr("somefile.txt")

This will import a 2D array from a text file via numpy, strip off the "b'" and "'" from the beginning and end of each string, and return a stripped string array named "data".

Are there better ways? Probably.

Does this work? Yup. I use it enough that I've got this function in my own Python module.

ivanarnold
  • 115
  • 5
1

I had the same issue and for me the simplest way turned out to use the csv library. You get your desired output by:

import csv
def loadFromCsv(filename):
    with open(filename,'r') as file:
        list=[elem for elem in csv.reader(file,delimiter='\n')]
    return list

a=loadFromCsv('tile')
print(a)
Markus Dutschke
  • 9,341
  • 4
  • 63
  • 58