1

Following this recipe. I tried to filter a dataframe by the columns name that contain the string '+'. Here's the example:

B = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]],
                columns=['A', '+B', '+C'], index=[1, 2, 3, 4, 5])

So I want a dataframe C with only '+B' and '+C' columns in it.

C = B.filter(regex='+')

However I get the error:

File "c:\users\hernan\anaconda\lib\site-packages\pandas\core\generic.py", line 1888, in filter
matcher = re.compile(regex)
File "c:\users\hernan\anaconda\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "c:\users\hernan\anaconda\lib\re.py", line 244, in _compile
raise error, v # invalid expression
error: nothing to repeat

The recipe says it is Python 3. I use python 2.7. However, I don't think that is the problem here.

Hernan

hernanavella
  • 5,462
  • 8
  • 47
  • 84

1 Answers1

2

+ has a special meaning in regular expressions (see here). You can escape it with \:

>>> C = B.filter(regex='\+')
>>> C
   +B  +C
1   5   2
2   4   4
3   3   1
4   2   2
5   1   4

Or, since all you care about is the presence of +, you could use the like argument instead:

>>> C = B.filter(like="+")
>>> C
   +B  +C
1   5   2
2   4   4
3   3   1
4   2   2
5   1   4
DSM
  • 342,061
  • 65
  • 592
  • 494
  • Thanks!. Related, is it possible to do something like C = B.filter(like="+" or like="-")? – hernanavella Feb 24 '15 at 18:36
  • 1
    @hernanavella: not with `like`, but you could use `regex`, something like `B.filter(regex="\+|-")` (where the `|` means "or"). But frankly at that point I wouldn't bother trying to be clever, and I'd simply write `B[[col for col in B if "+" in col or "-" in col]]`. – DSM Feb 24 '15 at 18:39