Python - convert list of string to float - square braces and decimal point causing problems

Question

I have a text file that contains a smaller dataset(taken from csv file) like so -

2020-05-24T10:44:37.613168#[ 0.          0.         -0.06210425  0.        ]
2020-05-24T10:44:37.302214#[1. 1. 0. 0.]
2020-05-24T10:44:36.192222#[0. 0. 0. 0.]

Then read from it using

data = f.readlines()
for row in data:
    img_id, label = row.strip("\n").split("#")

where in label is a string list which looks like

[ 0.          0.         -0.24604772  0.        ]
[ 0.          0.         -0.24604772  0.        ]
[1. 1. 0. 0.]

I'd like to convert each string element to float. However, the square brace [] and decimal . preventing me from converting.

Tried so far -

Removing [] so - label = label[1:-1] but I would need them as an array later. Then doing this print([list(map(float, i.split())) for i in label]) resulted in error ValueError: could not convert string to float: '.'
Using ast.literal_eval.label = ast.literal_eval(row.strip("\n").split("#")). Getting ValueError: malformed node or string: ['2020-05-24T10:57:52.882241 [0. 0. 0. 0.]']

Referred

Need to read string into a float array

Cannot convert list of strings to list of floats in python using float()

How do you convert a list of strings to a list of floats using Python?

Convert list of strings to numpy array of floats

When to use ast.literal_eval

So,

What else should I try in order to convert them to float array which is iterable? Or what am I doing wrong? Should I have to remove the square braces?
If I can make things much easier, how can I store the data in txt file? Is CSV better than txt in this case?
I need to extend this logic to 110,000 entries. Will any of steps cause problems then?

Thank you. Any help will be greatly appreciated. Please help.

Hello! 1. Why there are so many spaces in some lines? 2. Which float-values do you mean? -0.24604772 is a float, but "1." is not — CMinusMinus, Jun 07 '20 at 19:35
This the `str` display of an array. It's not designed for recreating an array. Since it's missing the commas it can't be parsed as a list. Use string methods to clean it up one or the other. `numpy` isn't going to help you. — hpaulj, Jun 07 '20 at 19:43
@ProgrammerJonas I mean if a float number is present, its precision width is taken. 1. row doesn't have a larger float number. So it is normal spaced. The thing is I stored it like that in the first place into txt file. — Deepak, Jun 07 '20 at 19:59
@hpaulj Thanks. I will consider this. After seeing the solutions, I think it is better not to store it as numpy array. — Deepak, Jun 07 '20 at 20:00

Lewis · Accepted Answer · 2020-06-07T19:57:05.007

2

For each line, trim off the first and last char with line[1:-1], split by whitespace with .split(), and parse each float with float().

line = "[ 0.          0.         -0.24604772  0.        ]"
floats = [float(item) for item in line[1:-1].split()]

print(floats)
>>> [0.0, 0.0, -0.24604772, 0.0]

edited Jun 07 '20 at 19:57

answered Jun 07 '20 at 19:51

Lewis

4,285
1
23
36

Thank you. What if I exclude the delimiter `#` while saving to the txt file and leave it as whitespace between timestamp and the list. How can export the data into list variables? – Deepak Jun 07 '20 at 20:33

score 1 · Answer 2 · answered Jun 07 '20 at 19:47

    for row in data:
        img_id, label = row.strip("\n").split("#")
        # >>>[ 0.          0.         -0.24604772  0.        ]

        label = label[1:-1] # Cuts the first and last letter
        # >>> 0.          0.         -0.24604772  0.   

        label = label.strip() # Remove all spaces before and after label
        # >>>0.          0.         -0.24604772  0.

        labelElements = label.split() # Cuts the string on every space(s)
        # >>>["0.", "0.", "-0.24604772", "0."]

        labelFloats = []
        for L in labelElements:
            labelFloats.append(float(L)) # for example: "1." -> 1.0

By the way:
The variable [label] does not have a list of lines (You called it a "string list"), its one line:

# label = [ 0.          0.         -0.24604772  0.        ]

My bad. I didn't know the right term to call it. I learned something new now. Thank you. — Deepak, Jun 07 '20 at 20:04

revliscano · Answer 3 · 2020-06-07T20:58:42.857

I think given your case, I think I would go with regular expressions to extract the desired numbers. I would do something as follows:

import re


f = open('your_file.txt')
lines = f.read().splitlines()
f.close()
floats = []
for line in lines:
    img_id, label = line.split("#")
    floats.append([*map(float, re.findall('-?[\d]+\.?[\d]*', label))])

Printing floats outputs:

[[0.0, 0.0, -0.06210425, 0.0], [1.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0]]

Python - convert list of string to float - square braces and decimal point causing problems

3 Answers3

Linked