12

I am currently working on an application which requires reading all the input from a file until a certain character is encountered.

By using the code:

file=open("Questions.txt",'r')
c=file.readlines()
c=[x.strip() for x in c]

Every time strip encounters \n, it is removed from the input and treated as a string in list c.

This means every line is split into the part of a list c. But I want to make a list up to a point whenever a special character is encountered like this:

if the input file has the contents:

1.Hai
2.Bye\-1
3.Hello
4.OAPd\-1

then I want to get a list as c=['1.Hai\n2.Bye','3.Hello\n4.OApd']

Please help me in doing this.

Yannis
  • 1,682
  • 7
  • 27
  • 45
Tarun
  • 151
  • 1
  • 1
  • 7
  • Related: https://stackoverflow.com/questions/51980776/python-readline-with-custom-delimiter | https://stackoverflow.com/questions/3893885/cheap-way-to-search-a-large-text-file-for-a-string – Ciro Santilli OurBigBook.com Sep 27 '21 at 21:42

2 Answers2

21

The easiest way would be to read the file in as a single string and then split it across your separator:

with open('myFileName') as myFile:
  text = myFile.read()
result = text.split(separator)  # use your \-1 (whatever that means) here

In case your file is very large, holding the complete contents in memory as a single string for using .split() is maybe not desirable (and then holding the complete contents in the list after the split is probably also not desirable). Then you could read it in chunks:

def each_chunk(stream, separator):
  buffer = ''
  while True:  # until EOF
    chunk = stream.read(CHUNK_SIZE)  # I propose 4096 or so
    if not chunk:  # EOF?
      yield buffer
      break
    buffer += chunk
    while True:  # until no separator is found
      try:
        part, buffer = buffer.split(separator, 1)
      except ValueError:
        break
      else:
        yield part

with open('myFileName') as myFile:
  for chunk in each_chunk(myFile, separator='\\-1\n'):
    print(chunk)  # not holding in memory, but printing chunk by chunk
Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
Alfe
  • 56,346
  • 20
  • 107
  • 159
  • 2
    Using [`partition`](https://docs.python.org/3/library/stdtypes.html#str.partition) instead of [`split`](https://docs.python.org/3/library/stdtypes.html#str.split) might be faster. – Cristian Ciupitu Sep 29 '19 at 14:04
-3

I used "*" instead of "-1", I'll let you make the appropriate changes.

s = '1.Hai\n2.Bye*3.Hello\n4.OAPd*'
temp = ''
results = []

for char in s:
    if char is '*':
        results.append(temp)
        temp = []
    else:
        temp += char

if len(temp) > 0:
    results.append(temp)
Stolson
  • 100
  • 6
  • -1 because it doesn't work. I got `['1.Hai\n2.Bye', ['3', '.', 'H', 'e', 'l', 'l', 'o', '\n', '4', '.', 'O', 'A', 'P', 'd']]`. `temp = []` should be replaced with `temp = ''`. Also `char is '*'` ?! Since when strings are compared with `is` instead of `==`? – Cristian Ciupitu Sep 29 '19 at 01:33