Simple testing for length:
#!python3
#coding=utf-8
data = """06/01/2016, 10:40 pm - abcde
07/01/2016, 12:04 pm - abcde
07/01/2016, 12:05 pm - abcde
07/01/2016, 12:05 pm - abcde
07/01/2016, 6:14 pm - abcde
fghe
07/01/2016, 6:20 pm - abcde
07/01/2016, 7:58 pm - abcde
fghe
ijkl
07/01/2016, 7:58 pm - abcde"""
lines = data.split("\n")
out = []
for l in lines:
c = l.strip()
if c:
if len(c) < 10:
out[-1] += c
else:
out.append(c)
#skip empty
for o in out:
print(o)
results in:
06/01/2016, 10:40 pm - abcde
07/01/2016, 12:04 pm - abcde
07/01/2016, 12:05 pm - abcde
07/01/2016, 12:05 pm - abcde
07/01/2016, 6:14 pm - abcdefghe
07/01/2016, 6:20 pm - abcde
07/01/2016, 7:58 pm - abcdefgheijkl
07/01/2016, 7:58 pm - abcde
Does not contain the line breaks in the data!
But this one liner regular expression should do it (split on linebreak followed by digit), at least for the sample data (breaks when data contains linebreak followed by digit):
#!python3
#coding=utf-8
text_file = """06/01/2016, 10:40 pm - abcde
07/01/2016, 12:04 pm - abcde
07/01/2016, 12:05 pm - abcde
07/01/2016, 12:05 pm - abcde
07/01/2016, 6:14 pm - abcde
fghe
07/01/2016, 6:20 pm - abcde
07/01/2016, 7:58 pm - abcde
fghe
ijkl
07/01/2016, 7:58 pm - abcde"""
import re
data = re.split("\n(?=\d)", text_file)
print(data)
for d in data:
print(d)
Output:
['06/01/2016, 10:40 pm - abcde', '07/01/2016, 12:04 pm - abcde', '07/01/2016, 12:05 pm - abcde', '07/01/2016, 12:05 pm - abcde', '07/01/2016, 6:14 pm - abcde\n\
nfghe', '07/01/2016, 6:20 pm - abcde', '07/01/2016, 7:58 pm - abcde\n\nfghe\n\nijkl', '07/01/2016, 7:58 pm - abcde']
06/01/2016, 10:40 pm - abcde
07/01/2016, 12:04 pm - abcde
07/01/2016, 12:05 pm - abcde
07/01/2016, 12:05 pm - abcde
07/01/2016, 6:14 pm - abcde
fghe
07/01/2016, 6:20 pm - abcde
07/01/2016, 7:58 pm - abcde
fghe
ijkl
07/01/2016, 7:58 pm - abcde
(fixed with lookahead)