Find column whose name contains a specific string

Question

I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I'm searching for 'spike' in column names like 'spike-2', 'hey spike', 'spiked-in' (the 'spike' part is always continuous).

I want the column name to be returned as a string or a variable, so I access the column later with df['name'] or df[name] as normal. I've tried to find ways to do this, to no avail. Any tips?

Alvaro Fuentes · Accepted Answer · 2014-01-22T14:59:06.623

405

Just iterate over DataFrame.columns, now this is an example in which you will end up with a list of column names that match:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)

Output:

['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']

Explanation:

df.columns returns a list of column names
[col for col in df.columns if 'spike' in col] iterates over the list df.columns with the variable col and adds it to the resulting list if col contains 'spike'. This syntax is list comprehension.

If you only want the resulting data set with the columns that match you can do this:

df2 = df.filter(regex='spike')
print(df2)

Output:

   spike-2  spiked-in
0        1          7
1        2          8
2        3          9

edited Jan 22 '14 at 14:59

answered Jan 22 '14 at 14:25

Alvaro Fuentes

16,937
4
56
68

1

That's awesome! I don't really understand exactly how it works, though, still being new to both Python and Pandas. Could you perhaps explain? – erikfas Jan 22 '14 at 14:35
27

this is what ``DataFrame.filter`` does FYI (and you can supply a regex if you want) – Jeff Jan 22 '14 at 14:37
2

@xndrme how would you do a regex to **exclude** a certain column matching a regex instead of include? – Dhruv Ghulati Mar 31 '16 at 11:28
@DhruvGhulati You can do this: `spike_cols = [col for col in df.columns if not 'spike' in col]` to get the list of columns that **do not match**, or you can use a lookahead negative regex like `df2 = df.filter(regex='^(?!.*spike).*$')`. Of course this will always depends of the complexity of your excluding patter. You can read more here https://docs.python.org/2/howto/regex.html#lookahead-assertions – Alvaro Fuentes Mar 31 '16 at 11:42
3

@DhruvGhulati It is possible also to drop your unwanted columns as in `df[df.columns.drop(spike_cols)]`, there you get a `DataFrame` without the columns in the list `spike_cols` which you can obtain using your undesired regex. – Alvaro Fuentes Mar 31 '16 at 11:54
3

more concise code: `````df[[col for col in df.columns if "spike" in col]]````` – WindChimes May 22 '16 at 03:11
what if I have a list of strings like `['spike', 'foo', 'bar']` ad want all the columns that contain any of those strings? – JacoSolari Feb 02 '21 at 14:07
4

@JacoSolari `[col for col in df.columns if any(s in col for s in ['spike', 'foo', 'bar'])]` or `df.filter(regex='(spike)|(foo)|(bar)')` – Alvaro Fuentes Feb 03 '21 at 10:30

score 141 · Answer 2 · edited May 23 '17 at 10:31

141

This answer uses the DataFrame.filter method to do this without list comprehension:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6]}
df = pd.DataFrame(data)

print(df.filter(like='spike').columns)

Will output just 'spike-2'. You can also use regex, as some people suggested in comments above:

print(df.filter(regex='spike|spke').columns)

Will output both columns: ['spike-2', 'hey spke']

edited May 23 '17 at 10:31

Community

1
1

answered Jan 26 '17 at 15:10

Ben

2,308
2
18
25

I have many columns and I used this code, it seems it skips some of the names! In this example, imagine running this code and not returning 'hey spke' column!! – PM0087 May 24 '21 at 15:25
How about excluding some columns by name? How would we go about doing that? – MrSoLoDoLo Oct 03 '21 at 15:04
You could do a negative lookahead (regex='^(?!spke)') or get a boolean vector for columns doing something like df.columns.str.contains('spke'). – Ben Mar 21 '22 at 13:06

score 47 · Answer 3 · edited Apr 17 '20 at 00:55

47

You can also use df.columns[df.columns.str.contains(pat = 'spike')]

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

colNames = df.columns[df.columns.str.contains(pat = 'spike')] 

print(colNames)

This will output the column names: 'spike-2', 'spiked-in'

More about pandas.Series.str.contains.

edited Apr 17 '20 at 00:55

vasili111

6,032
10
50
80

answered Oct 08 '17 at 17:11

Michael James Kali Galarnyk

3,353
26
23

score 39 · Answer 4 · answered Oct 02 '18 at 11:40

39

# select columns containing 'spike'
df.filter(like='spike', axis=1)

You can also select by name, regular expression. Refer to: pandas.DataFrame.filter

answered Oct 02 '18 at 11:40

Manny

679
5
12

2

Easiest solution so far. Simple yet powerful! – ciurlaro Mar 14 '20 at 16:39
1

This is a wrong answer. Please note, questions ask for returning 'columns name' with a given string in it or pattern e.g. `spike`. – DataFramed Dec 17 '20 at 12:03

DhanushNayak · Answer 5 · 2019-10-12T17:18:53.897

17

df.loc[:,df.columns.str.contains("spike")]

edited Oct 12 '19 at 17:18

answered Oct 12 '19 at 13:26

DhanushNayak

171
1
3

1

While it does not exactly answer the original question, I really like this solution as it directly returns the sliced DataFrame (which is actually also what probably the OP is after). – malvoisen Jan 11 '21 at 17:35

score 9 · Answer 6 · answered Jan 05 '22 at 07:30

9

Another solution that returns a subset of the df with the desired columns:

df[df.columns[df.columns.str.contains("spike|spke")]]

answered Jan 05 '22 at 07:30

rambutan

199
2
10

score 4 · Answer 7 · answered Aug 02 '18 at 13:09

4

You also can use this code:

spike_cols =[x for x in df.columns[df.columns.str.contains('spike')]]

answered Aug 02 '18 at 13:09

Yury Wallet

1,474
1
13
24

score 1 · Answer 8 · answered Apr 17 '20 at 01:37

Getting name and subsetting based on Start, Contains, and Ends:

# from: https://stackoverflow.com/questions/21285380/find-column-whose-name-contains-a-specific-string
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
# from: https://cmdlinetips.com/2019/04/how-to-select-columns-using-prefix-suffix-of-column-names-in-pandas/
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html




import pandas as pd



data = {'spike_starts': [1,2,3], 'ends_spike_starts': [4,5,6], 'ends_spike': [7,8,9], 'not': [10,11,12]}
df = pd.DataFrame(data)



print("\n")
print("----------------------------------------")
colNames_contains = df.columns[df.columns.str.contains(pat = 'spike')].tolist() 
print("Contains")
print(colNames_contains)



print("\n")
print("----------------------------------------")
colNames_starts = df.columns[df.columns.str.contains(pat = '^spike')].tolist() 
print("Starts")
print(colNames_starts)



print("\n")
print("----------------------------------------")
colNames_ends = df.columns[df.columns.str.contains(pat = 'spike$')].tolist() 
print("Ends")
print(colNames_ends)



print("\n")
print("----------------------------------------")
df_subset_start = df.filter(regex='^spike',axis=1)
print("Starts")
print(df_subset_start)



print("\n")
print("----------------------------------------")
df_subset_contains = df.filter(regex='spike',axis=1)
print("Contains")
print(df_subset_contains)



print("\n")
print("----------------------------------------")
df_subset_ends = df.filter(regex='spike$',axis=1)
print("Ends")
print(df_subset_ends)

Find column whose name contains a specific string

8 Answers8

Linked

Related