Regex pattern to remove duplicate value in a str

Question

I need to remove the duplicate in a str and return a unique value using regex pattern.

Value = "['234.78','234.78']"

Expected result:

Value ="['234.78']"

Any help would be appreciated!

score 1 · Answer 1 · answered May 08 '23 at 18:20

Looks like you're working with formatted string data - you'd be better off using something like ast.literal_eval() instead if it's a Python-compatible data structure, rather than regular expressions. From there, you can use the list -> set -> list methodology to remove duplicates, then repr() to get the string representation back the way you prescribe:

import ast
value = "['234.78','234.78']"
value = list(set(ast.literal_eval(value)))
value = repr(value) # "['234.78']"

Mr. Polywhirl · Answer 2 · 2023-05-08T18:52:14.457

Since the data is almost JSON format, you could replace the single-quotes with double-quotes:

import json

def dedupe_serialized_list(serialized_list: str):
    """
    Dedupe a serialized list of str values.

    :param str serialized_list: A serialized list of str values
    :return: a deduped list (re-serialized)
    :rtype: str
    """
    return str(list(set(json.loads(serialized_list.replace("'", '"')))))

if __name__ == '__main__':
    print(dedupe_serialized_list("['234.78','234.78']")) # ['234.78']

As a lambda:

dedupe = lambda value: str(list(set(json.loads(value.replace("'", '"')))))

score -2 · Answer 3 · answered May 08 '23 at 19:27

Esqew's answer is a sensible approach.

If you're desperate to do it with regex, then the below code works:

import re
Value = "['234.78','234.78']"
Value=re.sub(r"('\d+\.\d+'),\1",r'\1',Value)
Value #"['234.78']

The matching pattern looks for a quote, ≥1 numbers, a decimal point and ≥1 numbers, a quote, a comma, then the number repeated.

Regex pattern to remove duplicate value in a str

3 Answers3