I am trying to parse a pdf and categorize information based on text formatting/decoration. How do you suggest I do that?
For example, I have a pdf in which the structure is repeated:
S.No. BOLD+UNDERLINED TITLE para
How do I categorize this data into an array of objects based on text decoration:
[
{ sno: "", title: "", desc: "" },
...
]