-2

I would like to remove all the unnecessary characters (in bold) before the 1st entry in a python list. I am trying to use regex to make it happen, please review my code?

Edit : I would like to remove all characters before and including the word safe.

['xian/gps_201610010000644016240301624032416162641013323634045015307 0ustar bigdata_safebigdata_safea01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277482,108.92466,34.27657', 'a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277488,108.92527,34.27658', 'a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277506,108.9276,34.27659', 'a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277476,108.92399,34.27655', 'a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277515,108.9291,34.2766']

def removePunctuation(text):
    text = re.sub(r"\x00+",'',text) 
    test = re.sub(r'.*a01', '',text)
    return text
Matthew Loh
  • 147
  • 11
  • 2
    And what is the rule for the bold text? See https://regex101.com/r/rBLzxD/1 for the actual example. – Jan Feb 13 '19 at 07:35
  • 2
    `return re.sub(r'^.*?(a01)', r'\1', text)`? This will work if all the junk chars appear at the start of the string and the real data starts with `a01`. – Wiktor Stribiżew Feb 13 '19 at 07:39
  • Your problem is that `.*` is greedy and will match everything until the last occurence of `a01` if you make it non-greedy with `.*?` as in the demo of Jan it should work – gaw Feb 13 '19 at 07:39
  • 1
    there is no rule for the bold text. I just want to remove those unnecessary characters. what would the resultant code look like? – Matthew Loh Feb 13 '19 at 07:44
  • You cannot solve a programming task without requirements. If you have no rules for that, just leave as is, it will be better than corrupting it further. – Wiktor Stribiżew Feb 13 '19 at 07:49
  • sorry last question. instead of being specific like what @Wiktor mentioned. Can I just remove all characters before and including the word safe? – Matthew Loh Feb 13 '19 at 07:57
  • You can. Have you tried that already? Please add the requirement to the *question* – Wiktor Stribiżew Feb 13 '19 at 07:58
  • return re.sub(r'^.*?(a01)', r'\1', text)...I want to do it in terms of regex – Matthew Loh Feb 13 '19 at 07:58
  • It is my code suggestion and it does not "remove all characters before and including the word safe" – Wiktor Stribiżew Feb 13 '19 at 07:59
  • Duplicate of [Use Regex re.sub to remove everything before and including a specified word](https://stackoverflow.com/questions/25045373/use-regex-re-sub-to-remove-everything-before-and-including-a-specified-word). – Wiktor Stribiżew Feb 13 '19 at 08:00

1 Answers1

-1

OP: there is no rule for the bold text., assuming the corrupt text will end with safe:

import re

def removePunctuation(text):
    for elem in text:
        if elem.startswith('a01'):
            print(elem)
        else:
            elem = elem.rpartition('safe')[2]
            print(elem)

test_list = ['xian/gps_201610010000644016240301624032416162641013323634045015307 0ustar bigdata_safebigdata_safea01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277482,108.92466,34.27657',
             'a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277488,108.92527,34.27658',
             'a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277506,108.9276,34.27659',
             'a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277476,108.92399,34.27655',
             'a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277515,108.9291,34.2766']



removePunctuation(test_list)

OUTPUT:

a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277482,108.92466,34.27657
a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277488,108.92527,34.27658
a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277506,108.9276,34.27659
a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277476,108.92399,34.27655
a01b8439e1e42ffcd286241b04d9b1b5,f11440a64a0f084fe346a398c62aa9ad,1475277515,108.9291,34.2766
DirtyBit
  • 16,613
  • 4
  • 34
  • 55