0

I have a dataframe with multiple columns and different headers. I want to filter the dataframe to keep only the columns that start with the letter I. Some of my column headers have the letter i but start with a different letter.

Is there a way to do this? I tried using df.filter but for some reason, it's not case sensitive.

Siba
  • 5
  • 3

1 Answers1

1

You can use df.filter with the regex parameter:

df.filter(regex=r'(?i)^i')

this will return columns starting with I ignoring the case. Regex Demo

Example below:

Lets consider the input dataframe:

df = pd.DataFrame(np.random.randint(0,20,(5,4)),
      columns=['itest','Itest','another','anothericol'])

print(df)

   itest  Itest  another  anothericol
0      1      4       14           17
1     17     10       14            1
2     16     18       10            7
3     10     12       17           14
4      6     15       17           19 

With df.filter

print(df.filter(regex=r'(?i)^i'))

   itest  Itest
0      1      4
1     17     10
2     16     18
3     10     12
4      6     15
anky
  • 74,114
  • 11
  • 41
  • 70
  • 1
    you could be verbose and pass a list comprehension to `loc` ``df.loc[:, [col.startswith("I") for col in df]]``. could be handy for folks that are not regex brave. Also, I may be wrong, but it seems the OP wants something that is case sensitvie and will filter for only capital `I` – sammywemmy Jan 10 '21 at 09:41
  • 1
    @sammywemmy one can convert to lower and use startswith with small case 'i'. However looks like this has been already answered hence I will be editing this to be a community wiki. Also I think we can do df.columns.str.startswith(('i','I')) as str.startswith takes a tuple, and pass into loc. – anky Jan 10 '21 at 11:06
  • 1
    yea. True. The tuple part is a good one as well. – sammywemmy Jan 10 '21 at 11:13