Remove a specifc repeated word using python regex?

Question

I have a string like :

'hi', 'what', 'are', 'are', 'what', 'hi'

I want to remove a specific repeated word. For example:

'hi', 'what', 'are', 'are', 'what'

Here, I am just removing the repeated word of hi, and keeping rest of the repeated words.

How to do this using regex?

Can you give me any solution? and it is not mandatory to use regex from my side. — Shamim Mahbub, Aug 18 '21 at 05:56
@Selcuk I believe this question is completely different then you had closed(duplicated) for! — imxitiz, Aug 18 '21 at 06:41
@ShamimMahbub I believe `"'hi', 'what', 'are', 'are', 'what', 'hi'"` is what you want to write ? — imxitiz, Aug 18 '21 at 06:53
@ShamimMahbub Copy/Paste what you are tried to do. Don't edit anything just copy/paste from your IDE exactly! — imxitiz, Aug 18 '21 at 07:00
'mode', 'name', 'phase', 'round', 'team_ct', 'score', 'name', 'mode'.... this is the actual string on which I am working on.. it is an output from a variable, and the type of the variable is a string. I just want to keep the 1st 'mode'. — Shamim Mahbub, Aug 18 '21 at 07:04
Sadly your question is closed but I will try to answer here, if any confusion then ask. Formatting will be vary bad but I will comment that will work by just doing copy/paste — imxitiz, Aug 18 '21 at 07:06
`arrayOfWords ='mode', 'name', 'phase', 'round', 'team_ct', 'score', 'name', 'mode';arrayOfWords=list(arrayOfWords);specificword="mode";[arrayOfWords.remove(specificword) for x in arrayOfWords if arrayOfWords.count(specificword)>1];print(arrayOfWords)` — imxitiz, Aug 18 '21 at 07:07
OR THIS `arrayOfWords ="'mode', 'name', 'phase', 'round', 'team_ct', 'score', 'name', 'mode'";import ast;arrayOfWords=list(ast.literal_eval(arrayOfWords));specificword="mode";[arrayOfWords.remove(specificword) for x in arrayOfWords if arrayOfWords.count(specificword)>1];print(arrayOfWords)` — imxitiz, Aug 18 '21 at 07:10
@Xitiz, 1st one is not working, and 2nd one deleting all mode — Shamim Mahbub, Aug 18 '21 at 07:16
Okay! I am confused not why it is deleting all "mode" it is working perfectly for me. Can you provide expected output for `arrayOfWords ="'mode', 'name', 'phase', 'round', 'team_ct', 'score', 'name', 'mode'"`? — imxitiz, Aug 18 '21 at 07:23
@Xitiz, I really loved your efforts for my problem. I am getting the expected result. Thank you so much. — Shamim Mahbub, Aug 18 '21 at 07:27
@ShamimMahbub UPVOTE that answer which is working fro you, by upvoting that comment will go to top and may help future people. — imxitiz, Aug 18 '21 at 07:31

score 1 · Answer 1 · answered Aug 18 '21 at 05:56

1

Regex is used for text search. You have structured data, so this is unnecessary.

def remove_all_but_first(iterable, removeword='hi'):
    remove = False
    for word in iterable:
        if word == removeword:
            if remove:
                continue
            else:
                remove = True
            yield word

Note that this will return an iterator, not a list. Cast the result to list if you need it to remain a list.

answered Aug 18 '21 at 05:56

Adam Smith

52,157
12
73
112

as far as i know, regex also can subtract repeated words. – Shamim Mahbub Aug 18 '21 at 06:08
@ShamimMahbub you're incorrect. Regex is a shortening of Regular Expressions, which are a way to do pattern matching in a [Regular Language](https://en.wikipedia.org/wiki/Regular_language). It does not solve the generalized form of "subtract repeated words." You could certainly craft a regex that will do what you want for some subsets of input, but since lists are not regular languages -- they are structured data -- regex is not the tool for this job. – Adam Smith Aug 18 '21 at 06:10
I do not have a list, I have a text file, which is string @ – Shamim Mahbub Aug 18 '21 at 06:47

Shreyas Prakash · Accepted Answer · 2021-08-18T07:15:30.060

0

You can do this

import re
s= "['hi', 'what', 'are', 'are', 'what', 'hi']"
# convert string to list. Remove first and last char, remove ' and empty spaces
s=s[1:-1].replace("'",'').replace(' ','').split(',')
remove = 'hi'
# store the index of first occurance so that we can add it after removing all occurance
firstIndex = s.index(remove)
# regex to remove all occurances of a word
regex = re.compile(r'('+remove+')', flags=re.IGNORECASE)
op = regex.sub("", '|'.join(s)).split('|')
# clean up the list by removing empty items
while("" in op) :
    op.remove("")
# re-insert the removed word in the same index as its first occurance
op.insert(firstIndex, remove)
print(str(op))

edited Aug 18 '21 at 07:15

answered Aug 18 '21 at 06:07

Shreyas Prakash

604
4
11

Actually OP is asking for specific word, right? – imxitiz Aug 18 '21 at 06:11
That `s` is in `string` not in `list`! – imxitiz Aug 18 '21 at 06:39
@ShreyasPrakash that is good, but it removing all **hi** – Shamim Mahbub Aug 18 '21 at 06:57
@Xitiz, can you help me with this? – Shamim Mahbub Aug 18 '21 at 07:00
@ShamimMahbub how about now? Addressed Xitiz comment also – Shreyas Prakash Aug 18 '21 at 07:05
@ShreyasPrakash, that is what I am asking for... thanks a lot.. but can you explain the code? please. – Shamim Mahbub Aug 18 '21 at 07:08
@ShamimMahbub I have added comments in the code – Shreyas Prakash Aug 18 '21 at 07:15

Guy · Answer 3 · 2021-08-18T06:38:04.993

0

You don't need regex for that, convert the string to list and then you can find the index of the first occurrence of the word and filter it from a slice of the rest of the list

lst = "['hi', 'what', 'are', 'are', 'what', 'hi']"
lst = ast.literal_eval(lst)
word = 'hi'

index = lst.index('hi') + 1
lst = lst[:index] + [x for x in lst[index:] if x != word]
print(lst) # ['hi', 'what', 'are', 'are', 'what']

edited Aug 18 '21 at 06:38

answered Aug 18 '21 at 06:14

Guy

46,488
10
44
88

I have a string object, not a list.. can you do this for string? – Shamim Mahbub Aug 18 '21 at 06:28
@ShamimMahbub are you telling `lst` from this answer is in string for you? – imxitiz Aug 18 '21 at 06:30
yes, that is what I am saying. – Shamim Mahbub Aug 18 '21 at 06:31
@ShamimMahbub you can change it to a list using `lst = ast.literal_eval(lst)` – Guy Aug 18 '21 at 06:32
@Xitiz AttributeError: 'str' object has no attribute 'literal_eval', i amgetting this error. – Shamim Mahbub Aug 18 '21 at 06:35
@ShamimMahbub `ast` should be an import, don't define it yourself. `import ast` – Guy Aug 18 '21 at 06:36
okay, and now i am getting this error: TypeError: can only concatenate tuple (not "list") to tuple – Shamim Mahbub Aug 18 '21 at 06:44
@ShamimMahbub You haven't responded my edit! Is that what you want, right? – imxitiz Aug 18 '21 at 06:46
@Xitiz, no, now see the question that i have edited – Shamim Mahbub Aug 18 '21 at 06:51

Remove a specifc repeated word using python regex?

3 Answers3