I have a dataframe column, where I need to match a list of substrings provided to me. But I need to select only those substrings which occur at least 3 times in my dataframe. Once I get those substrings all I need to do is to find those rows which contain substrings and assign some values corresponding to the matched pattern.
here's what I have currently done.
#dataframe has 9000 rows
reg_ = {'a':0.03,'b':0.05 ... 'ak':0.004} #approx 2000 patterns
for reg_pat in reg_:
count = 0
eliter = len(training[training['concat'].str.contains(reg_pat)]['concat'])
if (eliter >= 3):
training[reg_pat] = 0
training.loc[training['concat'].str.contains(reg_pat), reg_pat] = reg_[reg_pat]
muCOls += 1
Is there somehow I can do the filter and assignemnt as a single expression.
Please note that I have asked a sub-question fo the same as a separate question. There I have asked only for finding the counts. Here I am curious to know if both the operations i.e. finding the counts and then using it to assign values can be combined.
Finding the count of a set of substrings in pandas dataframe
The dataframe column looks like this
training['concat']
0 svAxu$paxArWAn
1 xvAxaSa$varRANi
2 AxAna$xurbale
3 go$BakwAH
4 viXi$Bexena
5 nIwi$kuSalaM
6 lafkA$upamam
7 yaSas$lipsoH
8 kaSa$AGAwam
9 hewumaw$uwwaram
10 varRa$pUgAn
Please let me know If I need to delete this question and just modify the old question. I think these are 2 sepearte problems, and hence 2 separate questions.