python3 trying to split string on \x0c

Question

I'm extracting text from a PDF to a string text:

text = "● A justification of your prediction, including the following information that helped form\n\no Angle of the sun relative to the surface on September 22, 2021\no Materials of the surface (include three materials) and heat absorption\n\ncharacteristics\n\no Length of exposure of the surface to the sun (i.e., the amount of time the surface\n\nhas had to warm on that day), including slopes of the stadium and a consideration\nof the angles of the seats\n\n1 Yes, I know that’s a Wednesday but just go with it…\n\n\x0c● Sources: Be sure to include in-text citations as appropriate as well as provide a list of\n\nsources that were used for your report, use MLA or APA citation style\n\n● Your report can assume any format you chose, and should be between 300-400 words in\n\nlength\n\nResources:\n\n"

I want to split this text on "\x0c". I tried re.split(r'[\x0c]+', text) but that simply removes the "\x0c", it does not split. Likewise, text.splitlines() didn't do the trick.

What am I missing?

Can you provide a [mcve], including the output it produces and the output you'd expect? — Ulrich Eckhardt, Oct 18 '21 at 16:15
see this: https://stackoverflow.com/questions/26184100/how-does-v-differ-from-x0b-or-x0c — Dani Mesejo, Oct 18 '21 at 16:17

score 1 · Answer 1 · answered Oct 18 '21 at 16:17

1

What's wrong with plain old

text.split("\x0c")

? That gives me a list of two elements, which looks like what you want here.

You can further split by line if you need to:

sections = [x.split("\n") for x in text.split("\x0c")]

answered Oct 18 '21 at 16:17

2e0byo

5,305
1
6
26

vexem · Answer 2 · 2021-10-18T16:35:42.260

0

There's probably a cleaner way, but this is would be my method:

splittext = text.split('\x0c')
splittext[0] += '\x0c'

string1 = splittext[0]
string2 = splittext[1]

edited Oct 18 '21 at 16:35

answered Oct 18 '21 at 16:22

vexem

86
3

python3 trying to split string on \x0c

2 Answers2