How to filter dataframe in pandas by 'str' in columns name?

Question

Following this recipe. I tried to filter a dataframe by the columns name that contain the string '+'. Here's the example:

B = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]],
                columns=['A', '+B', '+C'], index=[1, 2, 3, 4, 5])

So I want a dataframe C with only '+B' and '+C' columns in it.

C = B.filter(regex='+')

However I get the error:

File "c:\users\hernan\anaconda\lib\site-packages\pandas\core\generic.py", line 1888, in filter
matcher = re.compile(regex)
File "c:\users\hernan\anaconda\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "c:\users\hernan\anaconda\lib\re.py", line 244, in _compile
raise error, v # invalid expression
error: nothing to repeat

The recipe says it is Python 3. I use python 2.7. However, I don't think that is the problem here.

Hernan

score 2 · Accepted Answer · answered Feb 24 '15 at 18:31

2

+ has a special meaning in regular expressions (see here). You can escape it with \:

>>> C = B.filter(regex='\+')
>>> C
   +B  +C
1   5   2
2   4   4
3   3   1
4   2   2
5   1   4

Or, since all you care about is the presence of +, you could use the like argument instead:

>>> C = B.filter(like="+")
>>> C
   +B  +C
1   5   2
2   4   4
3   3   1
4   2   2
5   1   4

answered Feb 24 '15 at 18:31

DSM

342,061
65
592
494

Thanks!. Related, is it possible to do something like C = B.filter(like="+" or like="-")? – hernanavella Feb 24 '15 at 18:36
1

@hernanavella: not with `like`, but you could use `regex`, something like `B.filter(regex="\+|-")` (where the `|` means "or"). But frankly at that point I wouldn't bother trying to be clever, and I'd simply write `B[[col for col in B if "+" in col or "-" in col]]`. – DSM Feb 24 '15 at 18:39

How to filter dataframe in pandas by 'str' in columns name?

1 Answers1