How to extract specific keywords within the frame and extract only that data with in the separators

Question

I have a column data as follows:

abc|frame|gtk|enst.24|pc|hg|,abc|framex|gtk4|enst.35|pxc|h5g|,abc|frbx|hgk4|enst.23|pix|hokg|
abc|frame|gtk|enst.15|pc|hg|,abc|framex|gtk2|enst.59|pxc|h5g|,abc|frbx|hgk4|enst.18|pif|homg|
abc|frame|gtk|enst.98|pc|hg|,abc|framex|gtk1|enst.45|pxc|h5g|,abc|frbx|hgk4|enst.74|pig|hofg|
abc|frame|gtk|enst.34|pc|hg|,abc|framex|gtk1|enst.67|pxc|h5g|,abc|frbx|hgk4|enst.39|pik|hoqg|

I want to search and extract specific keywords within the frame and extract only that data with in the separators

Specific keywords are

enst.35
enst.18
enst.98
enst.63

The expected output is

abc|framex|gtk4|enst.35|pxc|h5g|
abc|frbx|hgk4|enst.18|pif|homg|
abc|frame|gtk|enst.98|pc|hg|
NA

I tried this herebut was not working effectively

Shubham Sharma · Answer 1 · 2020-06-06T09:05:53.467

You can construct a regex pattern using the given keywords then use Series.str.findall to find all occurrences of regex in series:

import re

keywords = ['enst.35','enst.18','enst.98','enst.63']
pattern = '|'.join([rf'[^,]+{re.escape(k)}[^,]+'for k in keywords])
result = df['col'].str.findall(pattern).str.get(0)

#print(result)

0    abc|framex|gtk4|enst.35|pxc|h5g|
1     abc|frbx|hgk4|enst.18|pif|homg|
2        abc|frame|gtk|enst.98|pc|hg|
3                                 NaN
Name: col, dtype: object

You can test the regex pattern here

score 0 · Answer 2 · answered Jun 06 '20 at 09:01

0

You can try in bashscript as follows:

for STRING in enst.35 enst.18 enst.98 enst.63; do
  tr \, \\n < file.txt | grep "$STRING" || echo NA
done

answered Jun 06 '20 at 09:01

David Erickson · Answer 3 · 2020-06-06T09:40:39.163

0

With str.extract and str.split(',) to take the last comma separated value:

df['Data2'] = df['Data'].str.extract('(^.*enst.35\|.+?\|.+?\||^.*enst.18\|.+?\|.+?\||^.*enst.98\|.+?\|.+?\||^.*enst.63\|.+?\|.+?\||)', expand=False).str.split(',').str[-1]

You could create a list of keywords and do list comprehension as well per another answer.

edited Jun 06 '20 at 09:40

answered Jun 06 '20 at 09:21

David Erickson

16,433
2
19
35

How to extract specific keywords within the frame and extract only that data with in the separators

3 Answers3