Convert import string to float with numpy's loadtext

Question

I'm attempting to import text from a flat file and to convert it to float values within a single line. I've seen this post which has the same error, but I haven't found which characters are invalid in my input file. Or do I have a syntax error?

Import as a string an print the result:

data = np.loadtxt(file, delimiter='\t', dtype=str)
print(data[0:2])
... 
[["b'Time'" "b'Percent'"]
 ["b'99'" "b'0.067'"]]

Attempt to import as float:

# Import data as floats and skip the first row: data_float
data_float = np.loadtxt(data, delimiter='\t', dtype=float, skiprows=1)

It throws the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    data_float = np.loadtxt(data, delimiter='\t', dtype=float, skiprows=1)
  File "<stdin>", line 848, in loadtxt
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "<stdin>", line 848, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]
ValueError: could not convert string to float: b'["b\'99\'" "b\'0.067\'"]'

By the way, I've also seen this post which explains the b character, but I don't think that's the issue.

An additional troubleshooting step as suggested by the first answer:

data = np.loadtxt(file, delimiter="\tb'", dtype=str)

Returns:

array(["b'Time\\tPercent'", "b'99\\t0.067'", "b'99\\t0.133'",
       "b'99\\t0.067'", "b'99\\t0'", "b'99\\t0'", "b'0\\t0.5'",
       "b'0\\t0.467'", "b'0\\t0.857'", "b'0\\t0.5'", "b'0\\t0.357'",
       "b'0\\t0.533'", "b'5\\t0.467'", "b'5\\t0.467'", "b'5\\t0.125'",
       "b'5\\t0.4'", "b'5\\t0.214'", "b'5\\t0.4'", "b'10\\t0.067'",
       "b'10\\t0.067'", "b'10\\t0.333'", "b'10\\t0.333'", "b'10\\t0.133'",
       "b'10\\t0.133'", "b'15\\t0.267'", "b'15\\t0.286'", "b'15\\t0.333'",
       "b'15\\t0.214'", "b'15\\t0'", "b'15\\t0'", "b'20\\t0.267'",
       "b'20\\t0.2'", "b'20\\t0.267'", "b'20\\t0.437'", "b'20\\t0.077'",
       "b'20\\t0.067'", "b'25\\t0.133'", "b'25\\t0.267'", "b'25\\t0.412'",
       "b'25\\t0'", "b'25\\t0.067'", "b'25\\t0.133'", "b'30\\t0'",
       "b'30\\t0.071'", "b'30\\t0'", "b'30\\t0.067'", "b'30\\t0.067'",
       "b'30\\t0.133'"], 
      dtype='<U16')

score 2 · Accepted Answer · answered Aug 27 '16 at 21:10

Thanks to everyone who took a look at my question. I restarted IPython and was now able to execute the same code without any problems. Here's the code that worked which is identical to above.

data_float = np.loadtxt(file, delimiter='\t', dtype=float, skiprows=1)

Result:

In [1]: data_float
Out[1]: 
array([[  9.90000000e+01,   6.70000000e-02],
       [  9.90000000e+01,   1.33000000e-01],
       [  9.90000000e+01,   6.70000000e-02],
       [  9.90000000e+01,   0.00000000e+00],
       [  9.90000000e+01,   0.00000000e+00],
       [  0.00000000e+00,   5.00000000e-01],
       [  0.00000000e+00,   4.67000000e-01],
       [  0.00000000e+00,   8.57000000e-01],
       [  0.00000000e+00,   5.00000000e-01],
       [  0.00000000e+00,   3.57000000e-01],
       [  0.00000000e+00,   5.33000000e-01],
       [  5.00000000e+00,   4.67000000e-01],
       [  5.00000000e+00,   4.67000000e-01],
       [  5.00000000e+00,   1.25000000e-01],
       [  5.00000000e+00,   4.00000000e-01],
       [  5.00000000e+00,   2.14000000e-01],
       [  5.00000000e+00,   4.00000000e-01],
       [  1.00000000e+01,   6.70000000e-02],
       [  1.00000000e+01,   6.70000000e-02],
       [  1.00000000e+01,   3.33000000e-01],
       [  1.00000000e+01,   3.33000000e-01],
       [  1.00000000e+01,   1.33000000e-01],
       [  1.00000000e+01,   1.33000000e-01],
       [  1.50000000e+01,   2.67000000e-01],
       [  1.50000000e+01,   2.86000000e-01],
       [  1.50000000e+01,   3.33000000e-01],
       [  1.50000000e+01,   2.14000000e-01],
       [  1.50000000e+01,   0.00000000e+00],
       [  1.50000000e+01,   0.00000000e+00],
       [  2.00000000e+01,   2.67000000e-01],
       [  2.00000000e+01,   2.00000000e-01],
       [  2.00000000e+01,   2.67000000e-01],
       [  2.00000000e+01,   4.37000000e-01],
       [  2.00000000e+01,   7.70000000e-02],
       [  2.00000000e+01,   6.70000000e-02],
       [  2.50000000e+01,   1.33000000e-01],
       [  2.50000000e+01,   2.67000000e-01],
       [  2.50000000e+01,   4.12000000e-01],
       [  2.50000000e+01,   0.00000000e+00],
       [  2.50000000e+01,   6.70000000e-02],
       [  2.50000000e+01,   1.33000000e-01],
       [  3.00000000e+01,   0.00000000e+00],
       [  3.00000000e+01,   7.10000000e-02],
       [  3.00000000e+01,   0.00000000e+00],
       [  3.00000000e+01,   6.70000000e-02],
       [  3.00000000e+01,   6.70000000e-02],
       [  3.00000000e+01,   1.33000000e-01]])

That was my comment below. I'll be happy to delete it. But could you explain why first? — Bobby, Aug 27 '16 at 21:12
I resolved it now, see my posted answer. I could also delete the entire question since the answer was trivial, but maybe it will help someone. — Bobby, Aug 27 '16 at 21:13
Delete the question: its off topic . from SO------the close pop-up .This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting. — Merlin, Aug 27 '16 at 21:15
It was the `skiprows=1` that solved the problem, not the restart. `genfromtxt` could have been used to treat the 1st line as a header. — hpaulj, Aug 27 '16 at 21:21
Sorry, I don't see what you mean about skiprows=1. I also had it before. Thanks for the tip about `genfromtxt`. — Bobby, Aug 27 '16 at 21:23
I investigated closing the question but I get a warning there saying I cannot close a question with answers. And now there's a positive vote and a few useful comments, so perhaps I should consider edit it instead. I'll wait a bit longer at least. — Bobby, Aug 27 '16 at 21:25

score 2 · Answer 2 · answered Aug 27 '16 at 21:28

The problem is that your numbers are quoted. That is, the field is '99', rather than 99. There are two ways you can do this. You can provide converter functions that strip the quotes and return a float. Or you can use the csv module to load your data in and then pass that data to numpy.

Using converter functions

import numpy as np
from io import StringIO

data = """'x'\t'y'
'1'\t'2.5'"""

arr = np.loadtxt(StringIO(data), dtype=float, delimiter="\t", skiprows=1, 
    converters=dict.fromkeys([0, 1], (lambda s: float(s.strip(b"'"))))
)

Using csv

import csv
import numpy as np
from io import StringIO

data = """'x'\t'y'
'1'\t'2.5'"""

reader = csv.reader(StringIO(data), quotechar="'", delimiter="\t")
next(reader) # skip headers
arr = np.array(list(reader), dtype=float)

In both examples I've uses StringIO so you can easily see the contents of the "file". You can of course pass the filename or file object to these functions.

I get nans while reading, i am doing `abcd = json.dumps("6.3, 2.7, 4.9, 1.8"); np.genfromtxt(StringIO(abcd), delimiter=',', dtype=float, converters=dict.fromkeys([0, 1], (lambda s: float(s.strip(b'"'))))).reshape(1,4)` returns `array([[6.3, 2.7, 4.9, nan]])` — Naveen Reddy Marthala, Feb 20 '22 at 10:02

score 1 · Answer 3 · answered Aug 27 '16 at 20:56

1

Could you try:

data = np.loadtxt(file, delimiter="\tb'", dtype=str)

To signify that the actual delimiter seems to include the characters "b'"?

answered Aug 27 '16 at 20:56

Ben Quigley

727
4
18

This worked without an error, but doesn't look right. I'll add to the question now. – Bobby Aug 27 '16 at 20:59
>>>float(b'42') is not an issue, but what you're dealing with, >>>float("b'42'") is not the same thing and errors out with ValueError: could not convert string to float: "b'18'" – Ben Quigley Aug 27 '16 at 20:59
Just updated question with the output of your suggestion. – Bobby Aug 27 '16 at 21:02
Oops, I forgot about the quotes. Go back to your previous delimiter, sorry – Ben Quigley Aug 27 '16 at 21:05
Try using delimiter = '\t', and replace all of the followings strings with empty strings: newstring = value.replace('"','').replace("b'","").replace("\\'",'') I think that should tidy things up to the point that the values containing numbers can be converted to floats. – Ben Quigley Aug 27 '16 at 21:10
I resolved it now, see my posted answer. I could also delete the entire question since the answer was trivial, but maybe it will help someone. – Bobby Aug 27 '16 at 21:11

Convert import string to float with numpy's loadtext

3 Answers3