1

Can someone point me in the right direction on this..

I have a string that contains sentences of words e.g. 'He was trying to learn a pythonic, or regex, way of solving his problem'

The string in question is quite large and I need to break it up into multiple lines, where each line can not exceed 64 characters. BUT I cant just insert a line break every 64 characters. I need to ensure the break occurs at the closest character (from a set of characters) before the 64th character, to ensure the line does not exceed 64 characters. e.g. I can only insert a line break after a space, comma or full stop

I also need the solution to be quite efficient as it is an action that will occur many, many times.

Using textwrap

I'm not sure textwrap is the way to go for my problem because I need to preserve the original line breaks in the input string. Example:

long_str = """
123456789 123456789 123456789 123456789 123456789 123456789
Line 1: Artificial intelligence (AI), sometimes called machine intelligence, 
Line 2: is intelligence demonstrated by machines, 
Line 3: in contrast to the natural intelligence displayed by humans and  other animals. 
Line 4: In computer science AI research is defined as
"""
lines = textwrap.wrap(long_str, 60, break_long_words=False)
print('\n'.join(lines))

What I want is this:

123456789 123456789 123456789 123456789 123456789 123456789
Line 1: Artificial intelligence (AI), sometimes called 
machine intelligence, 
Line 2: is intelligence demonstrated by machines, 
Line 3: in contrast to the natural intelligence displayed 
by humans and other animals. 
Line 4: In computer science AI research is defined as

But textwrap gives me this:

 123456789 123456789 123456789 123456789 123456789 123456789
Line 1: Artificial intelligence (AI), sometimes called
machine intelligence,  Line 2: is intelligence demonstrated
by machines,  Line 3: in contrast to the natural
intelligence displayed by humans and other animals.  Line 4:
In computer science AI research is defined as

I suspect that Regex is probably the answer but I'm out of my depth trying to solve this with regex.

The Welsh Dragon
  • 519
  • 1
  • 4
  • 19
  • It's in the standard library: https://docs.python.org/3.1/library/textwrap.html – mypetlion Nov 06 '18 at 00:11
  • I looked at this and it is almost what I am after but it doesnt seem preserve existing line breaks. I ONLY need to wrap lines/sentences where the line exceeds 64 characters and i need to preserve the format of the rest of the string – The Welsh Dragon Nov 06 '18 at 00:39
  • For an easy solution, split it into lines and call textwrap on them. – DonQuiKong Nov 06 '18 at 14:25

3 Answers3

1
import textwrap

def f1(foo): 
    return iter(foo.splitlines())

long_str = """
123456789 123456789 123456789 123456789 123456789 123456789
Line 1: Artificial intelligence (AI), sometimes called machine intelligence, 
Line 2: is intelligence demonstrated by machines, 
Line 3: in contrast to the natural intelligence displayed by humans and  other animals. 
Line 4: In computer science AI research is defined as
"""
[print('\n'.join(textwrap.wrap(l, 64, break_long_words=False))) for l in f1(long_str)]

with the iterating over the lines of a string per this

DonQuiKong
  • 413
  • 4
  • 15
1

Split the long string into separate lines on the newline. Wrap each separate line "as usual", then concatenate everything again to a single string.

import textwrap

long_str = """
123456789 123456789 123456789 123456789 123456789 123456789
Line 1: Artificial intelligence (AI), sometimes called machine intelligence, 
Line 2: is intelligence demonstrated by machines, 
Line 3: in contrast to the natural intelligence displayed by humans and  other animals. 
Line 4: In computer science AI research is defined as
"""

lines = []
for line in long_str.split('\n'):
    lines += textwrap.wrap(line, 60, break_long_words=False)
print('\n'.join(lines))

As textwrap returns a list of strings, you don't need to do anything else than keep on pasting them together, and join them at the end.

Jongware
  • 22,200
  • 8
  • 54
  • 100
  • Thanks. I had considered doing that but I was convinced there had to be a better way. If I add drop_whitespace=False then it gives me what I need – The Welsh Dragon Nov 06 '18 at 16:26
  • 1
    Im going to accept this as the answer because to me, a Python novice, the readability of this solution is preferable when compared to the solution proposed by @DonQuiKong – The Welsh Dragon Nov 06 '18 at 18:27
  • @CHaste do that, this answer does explain a little more too. Just as a side note, list comprehensions are “more pythonic“ and faster. – DonQuiKong Nov 06 '18 at 22:39
0

It might help us answer your question if you could provide any code you have already attempted. That being said, I believe the following example code will preserve existing line breaks, wrap lines exceeding 64 characters and preserve the format of the rest of the string.

import textwrap

long_str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, " \
       "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. " \
       "Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris" \
       "nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in" \
       "reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. " \
       "Excepteur sint occaecat cupidatat non proident, sunt in culpa qui" \
       "officia deserunt mollit anim id est laborum."

lines = textwrap.wrap(long_str, 64, break_long_words=False)

print('\n'.join(lines))

The output from Python is:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco
laborisnisi ut aliquip ex ea commodo consequat. Duis aute irure
dolor inreprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa quiofficia deserunt mollit anim id est
laborum.
Andy
  • 789
  • 8
  • 19