-4

I need to remove the duplicate in a str and return a unique value using regex pattern.

Value = "['234.78','234.78']"

Expected result:

Value ="['234.78']"

Any help would be appreciated!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
riku_zac
  • 55
  • 4

3 Answers3

1

Looks like you're working with formatted string data - you'd be better off using something like ast.literal_eval() instead if it's a Python-compatible data structure, rather than regular expressions. From there, you can use the list -> set -> list methodology to remove duplicates, then repr() to get the string representation back the way you prescribe:

import ast
value = "['234.78','234.78']"
value = list(set(ast.literal_eval(value)))
value = repr(value) # "['234.78']"
esqew
  • 42,425
  • 27
  • 92
  • 132
0

Since the data is almost JSON format, you could replace the single-quotes with double-quotes:

import json

def dedupe_serialized_list(serialized_list: str):
    """
    Dedupe a serialized list of str values.

    :param str serialized_list: A serialized list of str values
    :return: a deduped list (re-serialized)
    :rtype: str
    """
    return str(list(set(json.loads(serialized_list.replace("'", '"')))))

if __name__ == '__main__':
    print(dedupe_serialized_list("['234.78','234.78']")) # ['234.78']

As a lambda:

dedupe = lambda value: str(list(set(json.loads(value.replace("'", '"')))))
Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132
-2

Esqew's answer is a sensible approach.

If you're desperate to do it with regex, then the below code works:

import re
Value = "['234.78','234.78']"
Value=re.sub(r"('\d+\.\d+'),\1",r'\1',Value)
Value #"['234.78']

The matching pattern looks for a quote, ≥1 numbers, a decimal point and ≥1 numbers, a quote, a comma, then the number repeated.

bc1155
  • 263
  • 7