Remove item in pandas dataframe that starts with a comment char

Question

I would like to remove all rows in a pandas dataframe that starts with a comment character. For example:

>>> COMMENT_CHAR = '#'
>>> df
    first_name    last_name
0   #fill in here fill in here
1   tom           jones

>>> df.remove(df.columns[0], startswith=COMMENT_CHAR) # in pseudocode
>>> df
    first_name    last_name
0   tom           jones

How would this actually be done?

`df.loc[~df.first_name.str.startswith('#')]` or something like that. The inversion of that mask could be used with `df.drop` — user3483203, Dec 17 '18 at 21:31
@user3483203 would there be a way to do that by using the index instead of the column name? — David542, Dec 17 '18 at 21:35
@ChrisA pretty neat, would you want to explain how that works in an answer and I can accept it? — David542, Dec 17 '18 at 21:44

timgeb · Accepted Answer · 2018-12-17T21:46:22.753

Setup

>>> data = [['#fill in here', 'fill in here'], ['tom', 'jones']]                                                       
>>> df = pd.DataFrame(data, columns=['first_name', 'last_name'])                                                       
>>> df                                                                                                                 
      first_name     last_name
0  #fill in here  fill in here
1            tom         jones

Solution assuming only the strings in the first_name column matter:

>>> commented = df['first_name'].str.startswith('#')                                                                   
>>> df[~commented].reset_index(drop=True)                                                                              
  first_name last_name
0        tom     jones

Solution assuming you want to drop rows where the string in the first_name OR last_name column starts with '#':

>>> commented = df.apply(lambda col: col.str.startswith('#')).any(axis=1)                                             
>>> df[~commented].reset_index(drop=True)                                                                              
  first_name last_name
0        tom     jones

The purpose of reset_index is to re-label the rows starting from zero.

>>> df[~commented]                                                                                                     
  first_name last_name
1        tom     jones
>>>                                                                                                                    
>>> df[~commented].reset_index()                                                                                       
   index first_name last_name
0      1        tom     jones
>>>                                                                                                                    
>>> df[~commented].reset_index(drop=True)                                                                              
  first_name last_name
0        tom     jones

could you please explain the purpose of doing `reset_index()` at the end of the call and why that's necessary? — David542, Dec 17 '18 at 21:42
@David542 sure - without `reset_index`, each row that is kept keeps its original row label. In this example, the remaining row would have the label `1`. With `reset_index`, you re-label the rows starting from `0` and `drop=True` prevents the original index you are killing being moved to the columns. — timgeb, Dec 17 '18 at 21:44

Remove item in pandas dataframe that starts with a comment char

1 Answers1