0

The column name in the excel file I have is having no space, it is written as follow:

ProjectNo, ProjectName, DocNo, DescriptionOfDesign

so I want to split the column name to be lowercase and separated with _ :

project_no, project_name, doc_no, description_of_design

I tried to use re.findall

df.columns = re.findall('[A-Z][^A-Z]*', [df.columns])

the error I got:

TypeError: expected string or bytes-like object

how should I write my script in order to get the desired column names?

thanks in advance.

yangyang
  • 491
  • 4
  • 16
  • `df.columns.str.replace(some_pattern)` is the correct way to apply a regex replace pattern. `df.columns.str.extractall(some_pattern)` is how you find the pattern. – Quang Hoang Nov 09 '21 at 17:10

1 Answers1

0

In your case

df.columns=  df.columns.str.findall('[A-Z][^A-Z]*').map(lambda x : '_'.join(y.lower() for y in x) )
Out[32]: Index(['project_no', 'project_name', 'doc_no', 'description_of_design'], dtype='object')

And fix your code

df.columns.map(lambda x : re.findall('[A-Z][^A-Z]*', x))
Out[36]: 
Index([              ['Project', 'No'],             ['Project', 'Name'],
                         ['Doc', 'No'], ['Description', 'Of', 'Design']],
      dtype='object')
BENY
  • 317,841
  • 20
  • 164
  • 234