I need to parse a 4 pages file that contains train timetables.
The PDFBox problem: empty table cell = deleted!! :-(
Is it any way to make PDBBox assume that empty table cell = one special char/sequence?
Let's take an example:
-> station "Thann (A)"
-> I want to keep only the times if "Thann(D)" not empty... so I wouldn't keep 07.01!
-> how could I do this?
For now my app is working, I read the 4 pages of the PDF, and analyze the buffer data with a custom java class to get the data I need.
(I do it this way because with Android, there is a memory crash when I read the PDF twice or more... despite the fact that it works well with a standard java project!)
But this way, there are few times that I don't need to get because the next station is empty.
I would like to get for "Thann (A)":
06.01|06.30|06.21|07.01|(empty)|07.30
06.02|06.32|06.22|(empty)|07.03|07.33
AND NOT:
06.01|06.30|06.21|07.01|07.30
06.02|06.32|06.22|07.03|07.33