I have a curious conundrum, I have a list of data, (for this case) it may look something like:
- Company XXXXXXX
- YYYYYY Incorporated
- Comp ZZ Inc.
- Com AAA BB
- StackOverflow
- Stack Overflow
- (etc...)
I would like to incorporate a minimized version onto the end of a series of Worksheets in Excel, say something like this(Remember the max size of a tab is 31 characters):
- Planning Setup Sheet XXXXXXX
- Planning Setup Sheet YYYYYY
- Planning Setup Sheet ZZ
- Planning Setup Sheet AAA BB
- Planning Setup Sheet StackOver
- Planning Setup Sheet Stack Ove
Now this is easy to do by hand, but very difficult to accomplish in an automated fashion, how does one recognize what is important in a string vs what can be removed? Clearly various characters and groupings of characters have varying levels of significance, how can one define that distribution and specific a cutoff?
This almost strikes me as something that would be a good candidate for a NLP type algorithm, or maybe generate a very large list and train a neural network to minimize strings, but I'd rather not overkill it, I would hope there'd be an easier way to go about this...
I feel as thought this is a classic problem, but I don't see any mention to it while googling around, typical solutions to similar problem usually revolve around just generating some random string. Maybe I'm searching with the wrong terms...
Does anyone have a easy way to minimize a string without losing is meaningfulness?