I have a weird raw data that contains multiple names in different ways and length. Something like:
data = [
'apple',
'apple;apple(big)',
'apple(apple),apple',
'banana(banana)',
'banana',
nan, # yes, there is some nan datas.
'cookie;cookie(cookie)',
'cookie(choco)']
The desired output is the Shortest valid name, in the demo case, output = ['apple', 'banana', 'cookie']
The way I think about is declaring a output =[]
and iterate through data and compare if element exists in output, if not, append output; if exists, compare if both have similarities and return one with smaller length. But this seems very inefficient, and I don't know how to compare and get the smallest valid value.
I tried regex too but it failed since the valid result is randomly placed too. How do I complete the task ?