0

So I have this txt file:

Haiku
5 *
7 *
5 *

Limerick
8 A
8 A
5 B
5 B
8 A

And I want to write a function that returns something like this:

[['Haiku', '5', '*', '7', '*', '5', '*'], ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8' ,'A']]

Ive tried this:

small_pf = open('datasets/poetry_forms_small.txt')

lst = []

for line in small_pf:
    lst.append(line.strip())
    
small_pf.close()

print(lst)

At the end I end up with this:

['Haiku', '5 *', '7 *', '5 *', '', 'Limerick', '8 A', '8 A', '5 B', '5 B', '8 A']

My problem is that this is one big list, and the elements of the list are attached together, like '5 *' or '8 A'. I honestly don't know where to start and thats why I need some guidance into what to do for those two problems. Any help would be greatly appreciated.

  • Partial duplicate (for splitting each line): [Split string on whitespace in Python](/q/8113782/4518341) – wjandrea Apr 05 '22 at 18:32
  • Sidenote: Best practice for opening files is using `with`. It's covered in the official tutorial [here](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files). And FWIW, you can simplify using a list comprehension: `with open(...) as small_pf: lst = [line.strip() for line in small_pf]` – wjandrea Apr 05 '22 at 18:43

4 Answers4

2

When you see an empty line : don't add it, save the tmp list you've been filling, and continue

lst = []
with open('test.txt') as small_pf:
    tmp_list = []
    for line in small_pf:
        line = line.rstrip("\n")
        if line == "":
            lst.append(tmp_list)
            tmp_list = []
        else:
            tmp_list.extend(line.split())

    if tmp_list:  # add last one
        lst.append(tmp_list)

print(lst)
# [['Haiku', '5', '*', '7', '*', '5', '*'],
#  ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
azro
  • 53,056
  • 7
  • 34
  • 70
1

First split the file into sections on blank lines (\n\n), then split each section on any whitespace (newlines or spaces).

lst = [section.split() for section in small_pf.read().split('\n\n')]

Result:

[['Haiku', '5', '*', '7', '*', '5', '*'],
 ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
wjandrea
  • 28,235
  • 9
  • 60
  • 81
0

Solution without using extra modules

small_pf = small_pf.readlines()
result = []
tempList = []
for index,line in enumerate(small_pf):
  if line == "\n" or index == len(small_pf) -1:
    result.append(tempList.copy())
    del tempList[:]
  else:
    for value in line.strip("\n").split():
      tempList.append(value)
result

Solution with module

You can use regex to solve your problem:

import re
small_pf = small_pf.read()
[re.split("\s|\n", x) for x in re.split("\n\n", small_pf)]

Output

[['Haiku', '5', '*', '7', '*', '5', '*'],
 ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]
TheFaultInOurStars
  • 3,464
  • 1
  • 8
  • 29
  • This could use an explanation. Nothing fancy, just a sentence or two would work. I'm thinking, "First split into sections on blank lines (`\n\n`), then split on lines and spaces together." – wjandrea Apr 05 '22 at 18:41
  • @AmirhosseinKiani Sorry I totally forgot to include this, but we cannot import other tools for our assignment. but again thanks for the answer –  Apr 05 '22 at 18:41
  • @wjandrea Thanks for the comment and the edit. Since I will try to find a new solution based on the OP's new requirement and then add a few explanations to the regex solution. – TheFaultInOurStars Apr 05 '22 at 18:44
  • Turns out it's possible to do the exact same thing as `re.split()` using `str.split()`, so I posted [my own answer](/a/71757026/4518341). Thanks for the inspiration :) – wjandrea Apr 05 '22 at 18:55
  • @wjandrea So I guess I might need to remove your explanation in my answer since it's already there in yours:). – TheFaultInOurStars Apr 05 '22 at 18:59
  • 1
    @Amirhossein You don't need to; you've given [proper attribution](/help/referencing) :) But if you'd prefer, you could rewrite it in your own words. Either way doesn't matter to me. – wjandrea Apr 05 '22 at 19:03
0

This approach assumes that a line either starts with a character that is a decimal value or a nondecimal value. Moreover, it assumes that if it starts with a nondecimal value that this should start a new list with the line (as a string, without any trailing whitespace) as the first element. If subsequent lines start with a decimal value, these are stripped of trailing whitespace, and parts of the line (determined by separation from a space) are added as elements in the most recently created list.

lst = []
with open("blankpaper.txt") as f:
    for line in f:
        # ignore empty lines 
        if line.rstrip() == '':
            continue
        if not line[0].isdecimal():
            new_list = [line.rstrip()]
            lst.append(new_list)
            continue
        new_list.extend(line.rstrip().split(" "))

print(lst)

Output

[['Haiku', '5', '*', '7', '*', '5', '*'], ['Limerick', '8', 'A', '8', 'A', '5', 'B', '5', 'B', '8', 'A']]

I hope this helps. If there are any questions, please let me know.

OTheDev
  • 2,916
  • 2
  • 4
  • 20