I need to remove the duplicate in a str and return a unique value using regex pattern.
Value = "['234.78','234.78']"
Expected result:
Value ="['234.78']"
Any help would be appreciated!
I need to remove the duplicate in a str and return a unique value using regex pattern.
Value = "['234.78','234.78']"
Expected result:
Value ="['234.78']"
Any help would be appreciated!
Looks like you're working with formatted string data - you'd be better off using something like ast.literal_eval()
instead if it's a Python-compatible data structure, rather than regular expressions. From there, you can use the list -> set -> list
methodology to remove duplicates, then repr()
to get the string representation back the way you prescribe:
import ast
value = "['234.78','234.78']"
value = list(set(ast.literal_eval(value)))
value = repr(value) # "['234.78']"
Since the data is almost JSON format, you could replace the single-quotes with double-quotes:
import json
def dedupe_serialized_list(serialized_list: str):
"""
Dedupe a serialized list of str values.
:param str serialized_list: A serialized list of str values
:return: a deduped list (re-serialized)
:rtype: str
"""
return str(list(set(json.loads(serialized_list.replace("'", '"')))))
if __name__ == '__main__':
print(dedupe_serialized_list("['234.78','234.78']")) # ['234.78']
As a lambda:
dedupe = lambda value: str(list(set(json.loads(value.replace("'", '"')))))
Esqew's answer is a sensible approach.
If you're desperate to do it with regex, then the below code works:
import re
Value = "['234.78','234.78']"
Value=re.sub(r"('\d+\.\d+'),\1",r'\1',Value)
Value #"['234.78']
The matching pattern looks for a quote, ≥1 numbers, a decimal point and ≥1 numbers, a quote, a comma, then the number repeated.