Reading a text file into lists, based on the spaces in the file

Question

So I have this txt file:

Haiku
5 *
7 *
5 *

Limerick
8 A
8 A
5 B
5 B
8 A

And I want to write a function that returns something like this:

[['Haiku', '5', '*', '7', '*', '5', '*'], ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8' ,'A']]

Ive tried this:

small_pf = open('datasets/poetry_forms_small.txt')

lst = []

for line in small_pf:
    lst.append(line.strip())
    
small_pf.close()

print(lst)

At the end I end up with this:

['Haiku', '5 *', '7 *', '5 *', '', 'Limerick', '8 A', '8 A', '5 B', '5 B', '8 A']

My problem is that this is one big list, and the elements of the list are attached together, like '5 *' or '8 A'. I honestly don't know where to start and thats why I need some guidance into what to do for those two problems. Any help would be greatly appreciated.

Partial duplicate (for splitting each line): [Split string on whitespace in Python](/q/8113782/4518341) — wjandrea, Apr 05 '22 at 18:32
Sidenote: Best practice for opening files is using `with`. It's covered in the official tutorial [here](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files). And FWIW, you can simplify using a list comprehension: `with open(...) as small_pf: lst = [line.strip() for line in small_pf]` — wjandrea, Apr 05 '22 at 18:43

azro · Answer 1 · 2022-04-05T18:42:20.223

When you see an empty line : don't add it, save the tmp list you've been filling, and continue

lst = []
with open('test.txt') as small_pf:
    tmp_list = []
    for line in small_pf:
        line = line.rstrip("\n")
        if line == "":
            lst.append(tmp_list)
            tmp_list = []
        else:
            tmp_list.extend(line.split())

    if tmp_list:  # add last one
        lst.append(tmp_list)

print(lst)
# [['Haiku', '5', '*', '7', '*', '5', '*'],
#  ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]

score 1 · Answer 2 · answered Apr 05 '22 at 18:53

First split the file into sections on blank lines (\n\n), then split each section on any whitespace (newlines or spaces).

lst = [section.split() for section in small_pf.read().split('\n\n')]

Result:

[['Haiku', '5', '*', '7', '*', '5', '*'],
 ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]

TheFaultInOurStars · Accepted Answer · 2022-04-05T19:01:27.977

0

Solution without using extra modules

small_pf = small_pf.readlines()
result = []
tempList = []
for index,line in enumerate(small_pf):
  if line == "\n" or index == len(small_pf) -1:
    result.append(tempList.copy())
    del tempList[:]
  else:
    for value in line.strip("\n").split():
      tempList.append(value)
result

Solution with module

You can use regex to solve your problem:

import re
small_pf = small_pf.read()
[re.split("\s|\n", x) for x in re.split("\n\n", small_pf)]

Output

[['Haiku', '5', '*', '7', '*', '5', '*'],
 ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]

edited Apr 05 '22 at 19:01

answered Apr 05 '22 at 18:32

TheFaultInOurStars

3,464
1
8
29

This could use an explanation. Nothing fancy, just a sentence or two would work. I'm thinking, "First split into sections on blank lines (`\n\n`), then split on lines and spaces together." – wjandrea Apr 05 '22 at 18:41
@AmirhosseinKiani Sorry I totally forgot to include this, but we cannot import other tools for our assignment. but again thanks for the answer – Apr 05 '22 at 18:41
@wjandrea Thanks for the comment and the edit. Since I will try to find a new solution based on the OP's new requirement and then add a few explanations to the regex solution. – TheFaultInOurStars Apr 05 '22 at 18:44
Turns out it's possible to do the exact same thing as `re.split()` using `str.split()`, so I posted [my own answer](/a/71757026/4518341). Thanks for the inspiration :) – wjandrea Apr 05 '22 at 18:55
@wjandrea So I guess I might need to remove your explanation in my answer since it's already there in yours:). – TheFaultInOurStars Apr 05 '22 at 18:59
1

@Amirhossein You don't need to; you've given [proper attribution](/help/referencing) :) But if you'd prefer, you could rewrite it in your own words. Either way doesn't matter to me. – wjandrea Apr 05 '22 at 19:03

OTheDev · Answer 4 · 2022-04-05T19:08:32.393

This approach assumes that a line either starts with a character that is a decimal value or a nondecimal value. Moreover, it assumes that if it starts with a nondecimal value that this should start a new list with the line (as a string, without any trailing whitespace) as the first element. If subsequent lines start with a decimal value, these are stripped of trailing whitespace, and parts of the line (determined by separation from a space) are added as elements in the most recently created list.

lst = []
with open("blankpaper.txt") as f:
    for line in f:
        # ignore empty lines 
        if line.rstrip() == '':
            continue
        if not line[0].isdecimal():
            new_list = [line.rstrip()]
            lst.append(new_list)
            continue
        new_list.extend(line.rstrip().split(" "))

print(lst)

Output

[['Haiku', '5', '*', '7', '*', '5', '*'], ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]

I hope this helps. If there are any questions, please let me know.

Reading a text file into lists, based on the spaces in the file

4 Answers4

Solution without using extra modules

Solution with module

Output