Reading a file until a specific character in python

Question

I am currently working on an application which requires reading all the input from a file until a certain character is encountered.

By using the code:

file=open("Questions.txt",'r')
c=file.readlines()
c=[x.strip() for x in c]

Every time strip encounters \n, it is removed from the input and treated as a string in list c.

This means every line is split into the part of a list c. But I want to make a list up to a point whenever a special character is encountered like this:

if the input file has the contents:

1.Hai
2.Bye\-1
3.Hello
4.OAPd\-1

then I want to get a list as c=['1.Hai\n2.Bye','3.Hello\n4.OApd']

Please help me in doing this.

Related: https://stackoverflow.com/questions/51980776/python-readline-with-custom-delimiter | https://stackoverflow.com/questions/3893885/cheap-way-to-search-a-large-text-file-for-a-string — Ciro Santilli OurBigBook.com, Sep 27 '21 at 21:42

score 21 · Accepted Answer · edited Sep 29 '19 at 13:57

The easiest way would be to read the file in as a single string and then split it across your separator:

with open('myFileName') as myFile:
  text = myFile.read()
result = text.split(separator)  # use your \-1 (whatever that means) here

In case your file is very large, holding the complete contents in memory as a single string for using .split() is maybe not desirable (and then holding the complete contents in the list after the split is probably also not desirable). Then you could read it in chunks:

def each_chunk(stream, separator):
  buffer = ''
  while True:  # until EOF
    chunk = stream.read(CHUNK_SIZE)  # I propose 4096 or so
    if not chunk:  # EOF?
      yield buffer
      break
    buffer += chunk
    while True:  # until no separator is found
      try:
        part, buffer = buffer.split(separator, 1)
      except ValueError:
        break
      else:
        yield part

with open('myFileName') as myFile:
  for chunk in each_chunk(myFile, separator='\\-1\n'):
    print(chunk)  # not holding in memory, but printing chunk by chunk

Using [`partition`](https://docs.python.org/3/library/stdtypes.html#str.partition) instead of [`split`](https://docs.python.org/3/library/stdtypes.html#str.split) might be faster. — Cristian Ciupitu, Sep 29 '19 at 14:04

score -3 · Answer 2 · answered Dec 21 '17 at 14:45

-3

I used "*" instead of "-1", I'll let you make the appropriate changes.

s = '1.Hai\n2.Bye*3.Hello\n4.OAPd*'
temp = ''
results = []

for char in s:
    if char is '*':
        results.append(temp)
        temp = []
    else:
        temp += char

if len(temp) > 0:
    results.append(temp)

answered Dec 21 '17 at 14:45

Stolson

100
6

-1 because it doesn't work. I got `['1.Hai\n2.Bye', ['3', '.', 'H', 'e', 'l', 'l', 'o', '\n', '4', '.', 'O', 'A', 'P', 'd']]`. `temp = []` should be replaced with `temp = ''`. Also `char is '*'` ?! Since when strings are compared with `is` instead of `==`? – Cristian Ciupitu Sep 29 '19 at 01:33

Reading a file until a specific character in python

2 Answers2

Linked