2

I have a text file that contains a smaller dataset(taken from csv file) like so -

2020-05-24T10:44:37.613168#[ 0.          0.         -0.06210425  0.        ]
2020-05-24T10:44:37.302214#[1. 1. 0. 0.]
2020-05-24T10:44:36.192222#[0. 0. 0. 0.]

Then read from it using

data = f.readlines()
for row in data:
    img_id, label = row.strip("\n").split("#")

where in label is a string list which looks like

[ 0.          0.         -0.24604772  0.        ]
[ 0.          0.         -0.24604772  0.        ]
[1. 1. 0. 0.]

I'd like to convert each string element to float. However, the square brace [] and decimal . preventing me from converting.

Tried so far -

  1. Removing [] so - label = label[1:-1] but I would need them as an array later. Then doing this print([list(map(float, i.split())) for i in label]) resulted in error ValueError: could not convert string to float: '.'

  2. Using ast.literal_eval.label = ast.literal_eval(row.strip("\n").split("#")). Getting ValueError: malformed node or string: ['2020-05-24T10:57:52.882241 [0. 0. 0. 0.]']

Referred

Need to read string into a float array

Cannot convert list of strings to list of floats in python using float()

How do you convert a list of strings to a list of floats using Python?

Convert list of strings to numpy array of floats

When to use ast.literal_eval

So,

  1. What else should I try in order to convert them to float array which is iterable? Or what am I doing wrong? Should I have to remove the square braces?
  2. If I can make things much easier, how can I store the data in txt file? Is CSV better than txt in this case?
  3. I need to extend this logic to 110,000 entries. Will any of steps cause problems then?

Thank you. Any help will be greatly appreciated. Please help.

Deepak
  • 126
  • 8
  • Hello! 1. Why there are so many spaces in some lines? 2. Which float-values do you mean? -0.24604772 is a float, but "1." is not – CMinusMinus Jun 07 '20 at 19:35
  • This the `str` display of an array. It's not designed for recreating an array. Since it's missing the commas it can't be parsed as a list. Use string methods to clean it up one or the other. `numpy` isn't going to help you. – hpaulj Jun 07 '20 at 19:43
  • @ProgrammerJonas I mean if a float number is present, its precision width is taken. 1. row doesn't have a larger float number. So it is normal spaced. The thing is I stored it like that in the first place into txt file. – Deepak Jun 07 '20 at 19:59
  • @hpaulj Thanks. I will consider this. After seeing the solutions, I think it is better not to store it as numpy array. – Deepak Jun 07 '20 at 20:00

3 Answers3

2

For each line, trim off the first and last char with line[1:-1], split by whitespace with .split(), and parse each float with float().

line = "[ 0.          0.         -0.24604772  0.        ]"
floats = [float(item) for item in line[1:-1].split()]

print(floats)
>>> [0.0, 0.0, -0.24604772, 0.0]
Lewis
  • 4,285
  • 1
  • 23
  • 36
  • Thank you. What if I exclude the delimiter `#` while saving to the txt file and leave it as whitespace between timestamp and the list. How can export the data into list variables? – Deepak Jun 07 '20 at 20:33
1
    for row in data:
        img_id, label = row.strip("\n").split("#")
        # >>>[ 0.          0.         -0.24604772  0.        ]

        label = label[1:-1] # Cuts the first and last letter
        # >>> 0.          0.         -0.24604772  0.   

        label = label.strip() # Remove all spaces before and after label
        # >>>0.          0.         -0.24604772  0.

        labelElements = label.split() # Cuts the string on every space(s)
        # >>>["0.", "0.", "-0.24604772", "0."]

        labelFloats = []
        for L in labelElements:
            labelFloats.append(float(L)) # for example: "1." -> 1.0

By the way:
The variable [label] does not have a list of lines (You called it a "string list"), its one line:

# label = [ 0.          0.         -0.24604772  0.        ]
CMinusMinus
  • 426
  • 1
  • 3
  • 11
1

I think given your case, I think I would go with regular expressions to extract the desired numbers. I would do something as follows:

import re


f = open('your_file.txt')
lines = f.read().splitlines()
f.close()
floats = []
for line in lines:
    img_id, label = line.split("#")
    floats.append([*map(float, re.findall('-?[\d]+\.?[\d]*', label))])

Printing floats outputs:

[[0.0, 0.0, -0.06210425, 0.0], [1.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0]]
revliscano
  • 2,227
  • 2
  • 12
  • 21