how to split string in column names based on the uppercase in pandas

Question

The column name in the excel file I have is having no space, it is written as follow:

ProjectNo, ProjectName, DocNo, DescriptionOfDesign

so I want to split the column name to be lowercase and separated with _ :

project_no, project_name, doc_no, description_of_design

I tried to use re.findall

df.columns = re.findall('[A-Z][^A-Z]*', [df.columns])

the error I got:

TypeError: expected string or bytes-like object

how should I write my script in order to get the desired column names?

thanks in advance.

`df.columns.str.replace(some_pattern)` is the correct way to apply a regex replace pattern. `df.columns.str.extractall(some_pattern)` is how you find the pattern. — Quang Hoang, Nov 09 '21 at 17:10

score 0 · Accepted Answer · answered Nov 09 '21 at 17:11

In your case

df.columns=  df.columns.str.findall('[A-Z][^A-Z]*').map(lambda x : '_'.join(y.lower() for y in x) )
Out[32]: Index(['project_no', 'project_name', 'doc_no', 'description_of_design'], dtype='object')

And fix your code

df.columns.map(lambda x : re.findall('[A-Z][^A-Z]*', x))
Out[36]: 
Index([              ['Project', 'No'],             ['Project', 'Name'],
                         ['Doc', 'No'], ['Description', 'Of', 'Design']],
      dtype='object')

how to split string in column names based on the uppercase in pandas

1 Answers1