//i've added a new take on this please see Cheating PHP integers . any help will be much appreciated. I've had an idea to trying and hack the storage option of the arrays by packing the integers into unsigned bytes (only need 8 or 16 bits integers to reduce the memory considerably).
Hi
I'm currently working on custom charset detection libraries and created a port from Mozilla's charset detection algorithm and used chardet (the python port) for a helping hand. However, this is extremely memory intensive in PHP (around 30mb of memory if I just load in Western language detection). I've optimised all I can without rewriting it from scratch to load each piece (this would reduce memory but make it a lot slower).
My question is that, do you know of any LGPL PHP libraries that do charset detection? This would be purely for research to give me a slight guiding hand in the right direction.
I already know of mb_detect_encoding but it's far too limited and brings up far too many false positives with the text files i have (yet python's chardet detects them perfectly)