How to select all Dataframe columns with the same names?

Question

I am creating a dataframe based on a csv import:

ID, attachment, attachment, comment, comment
1, lol.jpg, lmfao.png, 'Luigi',
2, cat.docx, , 'It's me', 'Mario'

Basically the number of 'attachments' and 'comment' columns corresponds to the line that has the bigger number of said attachment and comment. Since I am exporting the CSV from a third party software, I do not know in advance how many attachments and comment columns there will be.

Importing this CSV with pd.read_csv creates the following dataframe

	ID	attachment	attachment.1	comment	comment.1
0	1	lol.jpg	lmfao.png	'Luigi'
1	2	cat.docx		'It's me'	'Mario'

Is there a simple way to select all attachment/comment columns?

Such as attachments_df = imported_df.attachment.all or comments_df = imported_df['comment].??

Thanks.

jezrael · Accepted Answer · 2023-02-03T10:40:15.783

1

Use DataFrame.filter for columns starting by string by ^ and optionaly . with \d for comma with decimal for end of string is used $:

attachments_df = imported_df.filter(regex='^attachment\.*\d*$')
comments_df = imported_df.filter(regex='^comment\.*\d*$')

edited Feb 03 '23 at 10:40

answered Feb 03 '23 at 10:34

jezrael

822,522
95
1,334
1,252

score 1 · Answer 2 · answered Feb 03 '23 at 10:49

1

Another possible solution:

attachments_df = imported_df.loc[:,imported_df.columns.str.startswith('attachment')]
comments_df = imported_df.loc[:,imported_df.columns.str.startswith('comment')]

answered Feb 03 '23 at 10:49

PaulS

21,159
2
9
26

score 1 · Answer 3 · answered Feb 03 '23 at 10:50

1

you also can use like atribute of filter function:

imported_df.filter(like='attach')
'''
  attachment attachment.1
0    lol.jpg    lmfao.png
1   cat.docx          NaN

answered Feb 03 '23 at 10:50

SergFSM

1,419
1
4
7

How to select all Dataframe columns with the same names?

3 Answers3