I'm trying to split text into chunks to send to Google's text-to-speech engine (which accepts max. 5000 characters per query). I want to split longer files on a whitespace character with a maximum length of 5000 characters. My current code (using a chunk size of 15 instead of 5000):
def split_text(text) -> list:
start = 0
chunk_size = 15
chunk = ''
chunks = []
chunks_remaining = True
while chunks_remaining:
end = start + chunk_size
if end >= len(text):
chunks_remaining = False
chunk = text[start:end]
end = chunk.rfind(' ') + start
chunks.append(text[start:end] + "...")
start = end+1
return chunks
def main():
text = "This is just a text string for demonstrative purposes."
chunks = split_text(text)
print(chunks)
Is there a way to replace chunk.rfind(' ')
with something that accepts any whitespace character?