Reading a line from a file and spliting the string

Question

I hope you could answer my question. I am new to python so I ask your help. I want to open a file that contains the following lines. I would like to read each line and store every charaster of it as a string to a list.

A B 2

A E 2

A W 1

B D 5

B W 4

B C 2

B F 3

C F 7

C V 9

D E 1

D J 7

E K 3

F L 2

F M 7

F R 3

F Y 1

G K 8

G J 5

I want to store information about each line like this: [A B 2],[A E 2] will be ['A','B','2'],['A','E','2']

Possible duplicate of [How to split a string into a list?](https://stackoverflow.com/questions/743806/how-to-split-a-string-into-a-list) — Nullman, Feb 14 '19 at 10:30
Are you reading it in as a .txt file or .csv, if so you could declare your separator/delimiter as a space. You could then change the number column to str after to get your desired output. — lamecicle, Feb 14 '19 at 10:33

Jan · Answer 1 · 2019-02-14T10:39:04.903

3

You can do the following:

with open('testfile.txt') as fp:
    content = [elem
               for line in fp.readlines()
               for elem in [line.split()]
               if elem]
    print(content)

This yields

[['A', 'B', '2'], ['A', 'E', '2'], ['A', 'W', '1'], ['B', 'D', '5'], ['B', 'W', '4'], ['B', 'C', '2'], ['B', 'F', '3'], ['C', 'F', '7'], ['C', 'V', '9'], ['D', 'E', '1'], ['D', 'J', '7'], ['E', 'K', '3'], ['F', 'L', '2'], ['F', 'M', '7'], ['F', 'R', '3'], ['F', 'Y', '1'], ['G', 'K', '8'], ['G', 'J', '5']]

edited Feb 14 '19 at 10:39

answered Feb 14 '19 at 10:30

Jan

42,290
8
54
79

1

`line.split()` will automatically split on whitespace – Jim Wright Feb 14 '19 at 10:31
Hi! It prints elements like this: ['A B 2\n', 'A E 2\n', 'A W 1\n', 'B D 5\n', 'B w 4\n', 'B C 2\n', 'B F 3\n', 'C F 7\n', 'C V 9\n', 'D E 1\n', 'D J 7\n'] so not really as it should be – loukous Feb 14 '19 at 10:34
@loukous: Right. Changed, see the answer. – Jan Feb 14 '19 at 10:39
@MateenUlhaq: Better use a nested comprehension that to iterate over the whole list again imo. – Jan Feb 14 '19 at 10:40
I was thinking more `[x.split() for x in map(str.rstrip, fp) if x]`. That way, you're not pre-allocating the whole file into memory. – Mateen Ulhaq Feb 14 '19 at 10:48
@Jan I added speed tests in an answer – Ralf Feb 14 '19 at 11:57
1

@MateenUlhaq I added speed tests in an answer. The `map(str.strip, f)` construct does seem the fastest. – Ralf Feb 14 '19 at 11:57

score 2 · Accepted Answer · answered Feb 14 '19 at 10:37

2

Alternatively, as an explicit loop:

data = []

with open(filename) as f:
    for line in f:
        line = line.rstrip()
        if line == '':
            continue
        data.append(line.split())

answered Feb 14 '19 at 10:37

Mateen Ulhaq

24,552
19
101
135

Done any speed tests (+1) ? – Jan Feb 14 '19 at 10:41
2

I added speed tests in an answer – Ralf Feb 14 '19 at 11:56

Ralf · Answer 3 · 2019-02-14T12:27:10.693

I compared the proposals in here (3 with list comprehension and another 3 with for loop iteration and appending to a list):

def f_jan(filename):
    with open(filename) as f:
        return [
            elem
            for line in f.readlines()
            for elem in [line.split()]
            if elem]

def f_mateen_ulhaq_1(filename):
    with open(filename) as f:
        return [
            elem.split()
            for elem in map(str.rstrip, f)
            if elem]

def f_ralf_1(filename):
    with open(filename) as f:
        return [
            line.split()
            for line in f
            if line != '\n']

def f_mateen_ulhaq_2(filename):
    data = []
    with open(filename) as f:
        for line in f:
            line = line.rstrip()
            if line == '':
                continue
            data.append(line.split())

    return data

def f_mateen_ulhaq_3(filename):
    data = []
    with open(filename) as f:
        for line in f:
            if line == '\n':
                continue
            data.append(line.split())

    return data

def f_ralf_2(filename):
    data = []
    with open(filename) as f:
        for line in f:
            if line != '\n':
                data.append(line.split())

    return data

I created 2 files, one with 100 lines of the sample input provided in the question, and another file with 100.000 lines of the same input.

I tested that they all return the same data:

filename_1 = 'test_100_lines.txt'
assert (f_jan(filename_1)
        == f_mateen_ulhaq_1(filename_1)
        == f_ralf_1(filename_1)
        == f_mateen_ulhaq_2(filename_1)
        == f_mateen_ulhaq_3(filename_1)
        == f_ralf_2(filename_1))

Then, using timeit, I compared the speed (using a smaller number of repetitions for the large text file):

for fn, number in[
    ('test_100_lines.txt', 10000),
    ('test_100000_lines.txt', 100),
]:
    for func in [
            f_jan,
            f_mateen_ulhaq_1,
            f_ralf_1,
            f_mateen_ulhaq_2,
            f_mateen_ulhaq_3,
            f_ralf_2,
    ]:
        t = timeit.timeit('func(fn)', 'from __main__ import fn, func', number=number)
        print('{:25s} {:20s} {:10.4f} seconds'.format(fn, func.__name__, t))

The fastest solution for small and big input is f_ralf_1 (list comprehension without .strip(), just comparing against \n):

test_100_lines.txt        f_jan                    0.5019 seconds
test_100_lines.txt        f_mateen_ulhaq_1         0.4483 seconds
test_100_lines.txt        f_ralf_1                 0.3657 seconds
test_100_lines.txt        f_mateen_ulhaq_2         0.4523 seconds
test_100_lines.txt        f_mateen_ulhaq_3         0.3854 seconds
test_100_lines.txt        f_ralf_2                 0.3886 seconds

test_100000_lines.txt     f_jan                    3.1178 seconds
test_100000_lines.txt     f_mateen_ulhaq_1         2.6396 seconds
test_100000_lines.txt     f_ralf_1                 1.8084 seconds
test_100000_lines.txt     f_mateen_ulhaq_2         2.7143 seconds
test_100000_lines.txt     f_mateen_ulhaq_3         2.0398 seconds
test_100000_lines.txt     f_ralf_2                 2.0246 seconds

I think your version would get slightly faster results with `rstrip` rather than `strip`. :) Actually, another idea is to avoid the `rstrip` call overhead altogether on empty lines, and compare via `line == '\n'`. — Mateen Ulhaq, Feb 14 '19 at 12:13
@MateenUlhaq In my test there was no obvious difference between using `.strip()` and `.rstrip()`, so I don't think that is the important part — Ralf, Feb 14 '19 at 12:16
@MateenUlhaq you are right about comparing against `\n` instead of `.strip()`. I updated my answer with new meassurements. — Ralf, Feb 14 '19 at 12:27

Reading a line from a file and spliting the string

3 Answers3