0

I'm using this function to clean up my columns. However, somehow I'm deleting numbers, which I don't want to do. So for example here, when applied, I get: "standard_access_requested_application_rolegroup_ld_e"

Any help would be great. Thanks.

def text_replacement(x):
  """
  This function formats the field names so that they are more SQL friendly
  """
  
  for key, value in custom_fields_dict.items():
    pattern = re.compile(key, re.IGNORECASE)
    x = pattern.sub(value, x).lower().replace('fields.','').replace(' ','_').replace('™','')
    x = re.sub(r"[()\[\]&^%$#@!-:'\/]",'',x)
  return x

text_replacement("standard_access_requested_application:_'role/group': ]ld_10706(e)™")

The application of the function:

#Replace the columns in the dataframe
new_columns = []
for i in df.columns:
  new_columns.append(text_replacement(i))

df.columns = new_columns
Nikita Shabankin
  • 609
  • 8
  • 17
  • 1
    Your question is more regex based, because that is where the error is - maybe should include that tag. When I go to https://regex101.com/ and input your regex expression and the string that you are using, it highlights the numbers (amongst other characters) meaning your function is going to replace those digits (and other characters). I am not amazing at regex, but I will look at it and see if I can figure it out. – Shmack Sep 15 '22 at 05:12
  • I am uncertain, but try: `r"[()\[\]&^%$#@!-:'\/]\D"`... I don't know if it will exclude all or some digits, or what should be its equivalent `r"[()\[\]&^%$#@!-:'\/][^0-9]"`. – Shmack Sep 15 '22 at 05:16
  • One more, maybe `r"[()\[\]&^%$#@!-:'\/](?![0-9])"`. – Shmack Sep 15 '22 at 05:23

1 Answers1

0

The !-: part of your pattern represents a character in the range between ! and :, which apparently includes digits.

from regex101.com, characters coloured blue are >a match

If you put escape characters \ before !, - and :, this would work:

x = re.sub(r"[()\[\]&^%$#@\!\-\:'\/]", '', x)
[1]: 'standard_access_requested_application_rolegroup ld_10706e'

I only used the regex pattern, but not the rest of your code, so your own result may vary, but the digits would be saved.

Nikita Shabankin
  • 609
  • 8
  • 17
  • It partly worked. However customfield_10080 is now customfield_10 – elia_Werner Sep 15 '22 at 05:41
  • @elia_Werner `re.sub(r"[()\[\]&^%$#@\!\-\:'\/]",'', "customfield_10080")` worked for me, nothing was removed. Could there be something with the length of the string, something that trims it? Did you put '\' before each of '!', '-' and ':'? – Nikita Shabankin Sep 15 '22 at 05:49
  • I did. I'm writing ```x = re.sub(r"[()\[\]&^%$#@\!\-\:'\/]", '', x)``` – elia_Werner Sep 15 '22 at 06:00