Importing data from csv where each list is split into many rows

Question

Hi so I'm a bit stuck with this problem. I've got a csv file, which looks something like this:

[12  34 45 22 3 5
 34 33 2 67 5 55
 2 90 88 12 34]
[245  4 13]
[33 90 50 22 90 1
 23 44 876  10 7] ...

And so on. In other words, the csv file is split into lists of numbers separated either by a single space or double spaces and if the list of numbers exceeds a certain number of values (14 in my case), it continues the list on the next line until the list of numbers end. The lists of numbers are not separated by commas, but each new list begins and ends with the square brackets.

I want to import the csv file into a list of lists, which would look like this:

[[12, 34, 45, 22, 3, 5, 34, 33, 2, 67, 5, 55, 2, 90, 88, 12, 34], 
[245, 4, 13], 
[33, 90, 50, 22, 90, 1, 23, 44, 876, 10, 7], 
[...]]

How could I achieve this? I've tried np.loadtxt and pandas, but both treat every line as its own observation.

Thanks in advance!

Edit: The numbers are actually separated either by a single space or double spaces.

IoaTzimas · Answer 1 · 2020-09-24T16:42:17.273

1

The following should work:

with open('myfile.csv') as f:
    t=f.read()
t=t.replace('\n', '').replace('  ', ' ').replace(' ', ',')
l=t.split(']')
l.pop()
l=[i.replace('[', '') for i in l] 
result=[[int(s) for s in k.split(',')] for k in l]
print(result)

Output:

[[12, 34, 45, 22, 3, 5, 34, 33, 2, 67, 5, 55, 2, 90, 88, 12, 34], [245, 4, 13], [33, 90, 50, 22, 90, 1, 23, 44, 876, 10, 7]]

edited Sep 24 '20 at 16:42

answered Sep 24 '20 at 15:52

IoaTzimas

10,538
2
13
30

Let me know if you need any explanation of the steps in the above code. Cheers – IoaTzimas Sep 24 '20 at 15:55
2

2 best practice notes worth making. First: [Why is using 'eval' a bad practice?](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice) Second, don't forget to `close()` the file, or, even better, [use with open()](https://stackoverflow.com/questions/31334061/file-read-using-open-vs-with-open) as a context manager to handle file closing automatically – G. Anderson Sep 24 '20 at 15:58
Updated the code, after @G.Anderson recommendations – IoaTzimas Sep 24 '20 at 16:11
Thanks for the answer. I just had a closer look on my data and it seems that sometimes the numbers are seperated by a single space and sometimes by a double space. As a result, I get a ValueError: invalid literal for int() with base 10: ''. Any workaround for this? – merimursu Sep 24 '20 at 16:37
I made a small change so that it will replace double space with single one. However if the string is more complex (triple spaces, etc) probably we will need something more advanced like strip, etc – IoaTzimas Sep 24 '20 at 16:43

score 0 · Answer 2 · answered Sep 24 '20 at 16:06

You can use the built in csv library and then just split the values per row:

import csv

with open('test.csv', 'r') as testCsvFile:
    testCsv = csv.reader(testCsvFile)
    listOfLists = []
    for row in testCsv:
        listOfLists.append([int(val) for val in row[0][1:-1].split(' ')])
    print(listOfLists)


# Output
# [[12, 34, 45, 22, 3, 5, 34, 33, 2, 67, 5, 55, 2, 90, 88, 12, 34], [245, 4, 13], [33, 90, 50, 22, 90, 1, 23, 44, 876, 10, 7]]

Edit: Updated parsing to convert the values to ints

score 0 · Answer 3 · answered Sep 24 '20 at 16:08

Is this what you are looking for:

>>> with open("file.txt", "r") as f:
...     content = f.read().replace("\n", "")
... 
>>> content = [[int(i) for i in c.split(" ")] for c in content[1:-1].split("][")]
>>> content
[[12, 34, 45, 22, 3, 5, 34, 33, 2, 67, 5, 55, 2, 90, 88, 12, 34], [245, 4, 13], [33, 90, 50, 22, 90, 1, 23, 44, 876, 10, 7]]

First read in entire file as one string, stripping the first and last characters ([ and ]) as well as the newline characters (\n). Then split into chunks divided by ][. Finally split each chunk by the space character and turn them into integers.

Importing data from csv where each list is split into many rows

3 Answers3