I would like to make a series of files containing the trees in this PDF (http://mica.lif.univ-mrs.fr/d6.clean2-backup.pdf). The names of the files would be the corresponding tree numbers on the left (t0, t1, etc).
I have tried to use python to extract the relevant information and trees, but I'm having trouble. To be specific, when I tried extracting the trees as images (using https://nedbatchelder.com/blog/200712/extracting_jpgs_from_pdfs.html), none of the trees showed up (presumably because the trees aren't the right format). However, when I try extracting it all as text (as https://www.geeksforgeeks.org/working-with-pdf-files-in-python/), the trees lose all their formatting (and some of their information, I think). How could I go about getting the files I want from this PDF? Could it be done in Python? Is there another way that's easier?
Alternatively, the website (http://mica.lif.univ-mrs.fr/) from which I obtained the PDF has the trees in another form (ex: t27 S##1#l# NP#0#2#l#s NP#0#2#r#s VP##3#l# V##4#l#h V##4#r#h NP#1#5#l#s NP#1#5#r#s VP##3#r# S##1#r#). Is there a good way to convert this form into a good visual in the form of trees?
Any help in either of these approaches (or others if people have ideas) would be much appreciated. Thanks!