In python, how do I scan a text file with one long row and separate the items into different columns?

Question

I have a text file that looks like this:

“Distance 1: Distance XY” 1 2 4 5 9  “Distance 2: Distance XY”  3 6 8 10 5  “Distance 3: Distance XY”  88 45 36 12 4

It is all on one big line like this. My question is how do I take this and separate the distance measurements so that the lines look something more like this:

“Distance 1: Distance XY” 1 2 4 5 9  
“Distance 2: Distance XY”  3 6 8 10 5  
“Distance 3: Distance XY”  88 45 36 12 4

I want to do this to make a dictionary for each distance measurement.

You have "smart quotes" in what you've copied and pasted, rather than straight ASCII double quotes. Is that also what is in your file, or did that happen when you copied and pasted here? — Two-Bit Alchemist, Aug 01 '16 at 18:32
To separate one long string into a list of strings while keeping the delimiter as part of the string: `s = ['"D' + e for e in text_file.split('"D') if e != ""]`. Got this from: http://stackoverflow.com/questions/3475251/split-a-string-by-a-delimiter-in-python — Random Davis, Aug 01 '16 at 18:35
Those are the quotes that appear when I use open() to open my file in python. — Chase Garfield, Aug 01 '16 at 18:36

Psidom · Answer 1 · 2016-08-01T19:10:30.387

5

You can use re.split to split the string with regular expressions:

import re
s = '\"Distance 1: Distance XY\" 1 2 4 5 9  \"Distance 2: Distance XY\"  3 6 8 10 5  \"Distance 3: Distance XY\"  88 45 36 12 4'

re.split(r'(?<=\d)\s+(?=\")', s)

# ['"Distance 1: Distance XY" 1 2 4 5 9',
#  '"Distance 2: Distance XY"  3 6 8 10 5',
#  '"Distance 3: Distance XY"  88 45 36 12 4']

(?<=\d)\s+(?=\") constrains the delimiter to be the space between a digit and a quote.

If it is smart quote in the text file, replace \" with smart quote, option + [ on mac, check here for windows:

with open("test.txt", 'r') as f:
    for line in f:
        print(re.split(r'(?<=\d)\s+(?=“)', line.rstrip("\n")))

# ['“Distance 1: Distance XY” 1 2 4 5 9', '“Distance 2: Distance XY”  3 6 8 10 5', '“Distance 3: Distance XY”  88 45 36 12 4']

Or use the unicode for left smart quotation marks \u201C:

with open("test.csv", 'r') as f:
    for line in f:
        print(re.split(r'(?<=\d)\s+(?=\u201C)', line.rstrip("\n")))

# ['“Distance 1: Distance XY” 1 2 4 5 9', '“Distance 2: Distance XY”  3 6 8 10 5', '“Distance 3: Distance XY”  88 45 36 12 4']

edited Aug 01 '16 at 19:10

answered Aug 01 '16 at 18:35

Psidom

209,562
33
339
356

Thank you for the help! I must be inputing the code wrong because any time I run this, the output ends up being one individual character on each line. So for example: 'D' 'i' 's' 't' 'a' 'n' 'c' 'e' ' ' etc... Any idea on what I'm doing wrong? – Chase Garfield Aug 01 '16 at 19:14
You mean all the characters are splitted as single element? That's weird. Actually, don't have much idea, it should work if your file actually just contains one line. You can check all the three versions to see how it goes. It sounds like every character is a single line in your file. – Psidom Aug 01 '16 at 19:19
Thank you, I'll keep tweaking it. This was very helpful and definitely a step in the right direction. – Chase Garfield Aug 01 '16 at 19:22
Hey Psidom, what if I actually have more than one line? I think there are spaces above and below the line of data. The system I receive the data from outputs it with these two spaces like a Start and Stop placeholder. – Chase Garfield Aug 01 '16 at 19:30
If you have empty lines above or below that line, then you will get some empty list which can be ignored if you add some checker in the for loop such as `if len(line) != 0:`. – Psidom Aug 01 '16 at 19:32

score 1 · Answer 2 · answered Aug 01 '16 at 18:42

1

A perhaps less elegant solution than Psidom's, assuming the lines have the same format every time:

with open("input.txt", 'r') as file:
    line = file.read()
    line = line.split()
    count = 0
    output = open("output.txt", 'w')
    for i in line:
        output.write(i)
        output.write(" ")
        count+=1
        if count == 9:
            output.write("\n")
            count = 0
    output.close()

answered Aug 01 '16 at 18:42

Andrew

388
2
9

I don't think this addresses the OP's concern, which is that s/he wants the long line broken into sub-groups, separated by a particular string. – Mike Williamson Aug 01 '16 at 20:11

joaquinlpereyra · Answer 3 · 2016-08-01T19:34:50.503

1

A attempt to better Andrew's fine answer.

with open("input.txt", 'r') as file:
    output = open("output.txt", 'w')
    for line in file:
        line = line.split()
        relevant_line = line[0:9]
        relevant_line_as_string = " ".join(relevant_line)
        output.write(relevant_line_as_string + '\n')
    output.close()

You don't need to close if your are using 'with' :)

~ ❯❯❯ touch input
~ ❯❯❯ vim input
~ ❯❯❯ touch script.py
~ ❯❯❯ vim script.py # script.py has my answer copy pasted there
~ ❯❯❯ touch output
~ ❯❯❯ python script.py
~ ❯❯❯ cat output
“Distance 1: Distance XY” 1 2 4 5 9
# it works!

edited Aug 01 '16 at 19:34

answered Aug 01 '16 at 18:56

joaquinlpereyra

956
7
17

Thank you for this. I tried this and everything ran in python, but the output file that is created is blank. Any idea why that might be? – Chase Garfield Aug 01 '16 at 19:13
Yes. I've edited my answer. Silly of my. I was using 'with' with the input, but not with the output, so I do need to close the output. Or use two 'with' statements :) – joaquinlpereyra Aug 01 '16 at 19:20
Still a blank file :(. I have no idea what I am doing wrong. – Chase Garfield Aug 01 '16 at 19:27
I have tried this on my machine and it works. You should of course create both the input.txt and output.txt files and they should be in the same folder you're executing the script from. Do you get any output at all? Would you mind sharing the code and the folder structure? I have tried it even with your particular strings and it works: – joaquinlpereyra Aug 01 '16 at 19:29

In python, how do I scan a text file with one long row and separate the items into different columns?

3 Answers3