1

I have been trying a long time to get the IIHF PDF's (example here: http://stats.iihf.com/Hydra/349/IHM349131_74_3_0.pdf) to a parseable form.

Now I've finally did it, because Google's cache stores a HTML version from it (http://webcache.googleusercontent.com/search?q=cache:http://stats.iihf.com/Hydra/349/IHM349131_74_3_0.pdf) and it could be parsed easily.

The only problem is, that Google doesn't cache every PDF they have and even if they cache a file, it could take days to appear there.

Is there any way to get those HTML versions via any API or even manually?

Edit: These PDFs have somehow corrupted character maps, so that normal PDF to HTML converters can't convert them. Forgot to say.

Nelson
  • 49,283
  • 8
  • 68
  • 81
Miika Arponen
  • 11
  • 1
  • 4

0 Answers0