1

I am trying to extract data from a csv file with python 3.6. The data are both numbers and text (it's url addresses):

 file_name = [-0.47,  39.63, http://example.com]

On multiple forums I found this kind of code:

data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines,)

But this works for numbers only, the url addresses are read as NaN.

If I add dtype:

data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines, dtype=None)

The url addresses are read correctly, but they got a "b" at the beginning of the address, such as:

 b'http://example.com'

How can I remove that? How can I just have the simple string of text?

I also found this option:

file = open(file_path, "r")
csvReader = csv.reader(file)
for row in csvReader:
    variable = row[i]
    coordList.append(variable)

but it seems it has some issues with python3.

Carlo Bianchi
  • 115
  • 4
  • 15
  • The answers on this one show a pure Python and a Numpy solution using genfromtext for mixed data: https://stackoverflow.com/questions/18277183/python-numpy-read-a-text-file-with-mixed-format – Joe Jul 21 '17 at 05:20
  • "but it seems it has some issues with python3.": which are? Don't leave us guessing, show the problems. (Likely, just as the prepended `b`, you're running into unicode issues with Python 3. –  Jul 21 '17 at 05:23
  • You can convert a bytes object `b'http://example.com'` to a `str` by using the `.decode()` method: `b'http://example.com'.decode()`, for example. The reason `getfromtxt` is reading a bytes object, is that it has *no* idea what kind of text it is (boring English, or Chinese, or ...?), hence it assumes it's just a bunch of bytes. It's up to you to convert it. Alternatively, you could try and specify the dtype in `genfromtxt`, use e.g. `'U80'` and things may work better. See also [this question](https://stackoverflow.com/questions/33001373/loading-utf-8-file-in-python-3-using-numpy-genfromtxt). –  Jul 21 '17 at 05:30
  • The `b` just means `genfromtxt` has loaded the string as a bytestring. The default string type in Py3 is `unicode`. In Py2 bytestring is the default. – hpaulj Jul 21 '17 at 05:30
  • Look at the `data.dtype`. The text field will have an `Sn` type. If it was unicode it would have `Un`. It's possible to specify a `dtype` (rather than the authomatic `None`) with a `U` type if the `b` really bothers you. For many uses it doesn't matter whether the string is unicode or byte. – hpaulj Jul 21 '17 at 07:17
  • Thanks Evert, I got it! – Carlo Bianchi Jul 21 '17 at 23:37

0 Answers0