0

I have a string like :

'hi', 'what', 'are', 'are', 'what', 'hi'

I want to remove a specific repeated word. For example:

'hi', 'what', 'are', 'are', 'what'

Here, I am just removing the repeated word of hi, and keeping rest of the repeated words.

How to do this using regex?

  • You don't need regex for that (Unless it is mandatory) –  Aug 18 '21 at 05:55
  • Do you need to preserve the order? – Kota Mori Aug 18 '21 at 05:55
  • Can you give me any solution? and it is not mandatory to use regex from my side. – Shamim Mahbub Aug 18 '21 at 05:56
  • @KotaMori yes, i have to maintain the order – Shamim Mahbub Aug 18 '21 at 05:58
  • @Selcuk I believe this question is completely different then you had closed(duplicated) for! – imxitiz Aug 18 '21 at 06:41
  • @ShamimMahbub I believe `"'hi', 'what', 'are', 'are', 'what', 'hi'"` is what you want to write ? – imxitiz Aug 18 '21 at 06:53
  • @Xitiz No, please follow the question. – Shamim Mahbub Aug 18 '21 at 06:59
  • @ShamimMahbub Copy/Paste what you are tried to do. Don't edit anything just copy/paste from your IDE exactly! – imxitiz Aug 18 '21 at 07:00
  • 'mode', 'name', 'phase', 'round', 'team_ct', 'score', 'name', 'mode'.... this is the actual string on which I am working on.. it is an output from a variable, and the type of the variable is a string. I just want to keep the 1st 'mode'. – Shamim Mahbub Aug 18 '21 at 07:04
  • Sadly your question is closed but I will try to answer here, if any confusion then ask. Formatting will be vary bad but I will comment that will work by just doing copy/paste – imxitiz Aug 18 '21 at 07:06
  • `arrayOfWords ='mode', 'name', 'phase', 'round', 'team_ct', 'score', 'name', 'mode';arrayOfWords=list(arrayOfWords);specificword="mode";[arrayOfWords.remove(specificword) for x in arrayOfWords if arrayOfWords.count(specificword)>1];print(arrayOfWords)` – imxitiz Aug 18 '21 at 07:07
  • OR THIS `arrayOfWords ="'mode', 'name', 'phase', 'round', 'team_ct', 'score', 'name', 'mode'";import ast;arrayOfWords=list(ast.literal_eval(arrayOfWords));specificword="mode";[arrayOfWords.remove(specificword) for x in arrayOfWords if arrayOfWords.count(specificword)>1];print(arrayOfWords)` – imxitiz Aug 18 '21 at 07:10
  • @ShamimMahbub Answered! Check it. :) – imxitiz Aug 18 '21 at 07:12
  • @Xitiz, 1st one is not working, and 2nd one deleting all mode – Shamim Mahbub Aug 18 '21 at 07:16
  • which `arrayOfWords` is correct? 1st one or 2nd one? – imxitiz Aug 18 '21 at 07:18
  • @Xitiz 2nd one is correct – Shamim Mahbub Aug 18 '21 at 07:20
  • Okay! I am confused not why it is deleting all "mode" it is working perfectly for me. Can you provide expected output for `arrayOfWords ="'mode', 'name', 'phase', 'round', 'team_ct', 'score', 'name', 'mode'"`? – imxitiz Aug 18 '21 at 07:23
  • @Xitiz, I really loved your efforts for my problem. I am getting the expected result. Thank you so much. – Shamim Mahbub Aug 18 '21 at 07:27
  • @ShamimMahbub UPVOTE that answer which is working fro you, by upvoting that comment will go to top and may help future people. – imxitiz Aug 18 '21 at 07:31

3 Answers3

1

Regex is used for text search. You have structured data, so this is unnecessary.

def remove_all_but_first(iterable, removeword='hi'):
    remove = False
    for word in iterable:
        if word == removeword:
            if remove:
                continue
            else:
                remove = True
            yield word

Note that this will return an iterator, not a list. Cast the result to list if you need it to remain a list.

Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • as far as i know, regex also can subtract repeated words. – Shamim Mahbub Aug 18 '21 at 06:08
  • @ShamimMahbub you're incorrect. Regex is a shortening of Regular Expressions, which are a way to do pattern matching in a [Regular Language](https://en.wikipedia.org/wiki/Regular_language). It does not solve the generalized form of "subtract repeated words." You could certainly craft a regex that will do what you want for some subsets of input, but since lists are not regular languages -- they are structured data -- regex is not the tool for this job. – Adam Smith Aug 18 '21 at 06:10
  • I do not have a list, I have a text file, which is string @ – Shamim Mahbub Aug 18 '21 at 06:47
0

You can do this

import re
s= "['hi', 'what', 'are', 'are', 'what', 'hi']"
# convert string to list. Remove first and last char, remove ' and empty spaces
s=s[1:-1].replace("'",'').replace(' ','').split(',')
remove = 'hi'
# store the index of first occurance so that we can add it after removing all occurance
firstIndex = s.index(remove)
# regex to remove all occurances of a word
regex = re.compile(r'('+remove+')', flags=re.IGNORECASE)
op = regex.sub("", '|'.join(s)).split('|')
# clean up the list by removing empty items
while("" in op) :
    op.remove("")
# re-insert the removed word in the same index as its first occurance
op.insert(firstIndex, remove)
print(str(op))
Shreyas Prakash
  • 604
  • 4
  • 11
0

You don't need regex for that, convert the string to list and then you can find the index of the first occurrence of the word and filter it from a slice of the rest of the list

lst = "['hi', 'what', 'are', 'are', 'what', 'hi']"
lst = ast.literal_eval(lst)
word = 'hi'

index = lst.index('hi') + 1
lst = lst[:index] + [x for x in lst[index:] if x != word]
print(lst) # ['hi', 'what', 'are', 'are', 'what']
Guy
  • 46,488
  • 10
  • 44
  • 88