0

I have a similar question to Find column whose name contains a specific string but with an extension.

I have a dataframe with column names, and I want to find the ones that contains a certain string, after the underscore symbol.

E.g I'm searching for mango in column names like 'mango_man', 'man_mango', 'apple_mango' but only when it occurs after '_'. Results in this case will only return 'man_mango', 'apple_mango'

Column name to be returned as a string or a variable.

Mikee
  • 783
  • 1
  • 6
  • 18
  • so you are searching for `_mango` ? – Bijay Regmi Mar 11 '21 at 23:27
  • have you tried `your_list = [word for word in df.columns if '_mango' in word]` where `df` would be your dataframe ofc – Bijay Regmi Mar 11 '21 at 23:30
  • @BijayRegmi, nice way to put it but I want to search for mango after the punctuation because _ occurs in several forms in the data and I want the flexibility to specify the forms. – Mikee Mar 11 '21 at 23:30
  • And if you want to only return list of results including `mango` after the underscore, and you are not satisfied with command above, you can always do `mango_list = [word for word in my_words if word.split("_")[1] == "mango"]` – Bijay Regmi Mar 11 '21 at 23:35
  • That answers my question, if you put it as a proper answer I will accept. – Mikee Mar 11 '21 at 23:36

2 Answers2

0

You can give this to find out if mango_ or _mango exists in the columns.

This will check for both before and after. If you want to check explicitly for after, then you can give if '_mango' in col

df = pd.DataFrame(np.random.randint(0,10,(3,7)),
                columns = ['man','man_mango','mango_man','mango','mangoman','manmango','man mango'])
print (df)
mango_cols = [col for col in df.columns if any(x in col for x in ['mango_','_mango'])]
print (mango_cols)

The output of this will be:

['man_mango', 'mango_man']

To get explicitly only values after _mango, you can give:

cols = [col for col in df.columns if '_mango' in col]

The output of this will be:

['man_mango']
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
0

For getting list of results containing mango after an underscore _ in your dataframe df, you can either do

mango_list = [word for word in df.columns if '_mango' in word]

or

mango_list = [word for word in df.columns if word.split("_")[1]=="mango"]
Bijay Regmi
  • 1,187
  • 2
  • 11
  • 25