1

I am creating a dataframe based on a csv import:

ID, attachment, attachment, comment, comment
1, lol.jpg, lmfao.png, 'Luigi',
2, cat.docx, , 'It's me', 'Mario'

Basically the number of 'attachments' and 'comment' columns corresponds to the line that has the bigger number of said attachment and comment. Since I am exporting the CSV from a third party software, I do not know in advance how many attachments and comment columns there will be.

Importing this CSV with pd.read_csv creates the following dataframe

ID attachment attachment.1 comment comment.1
0 1 lol.jpg lmfao.png 'Luigi'
1 2 cat.docx 'It's me' 'Mario'

Is there a simple way to select all attachment/comment columns?

Such as attachments_df = imported_df.attachment.all or comments_df = imported_df['comment].??

Thanks.

Odyseus_v4
  • 195
  • 1
  • 11

3 Answers3

1

Use DataFrame.filter for columns starting by string by ^ and optionaly . with \d for comma with decimal for end of string is used $:

attachments_df = imported_df.filter(regex='^attachment\.*\d*$')
comments_df = imported_df.filter(regex='^comment\.*\d*$')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Another possible solution:

attachments_df = imported_df.loc[:,imported_df.columns.str.startswith('attachment')]
comments_df = imported_df.loc[:,imported_df.columns.str.startswith('comment')]
PaulS
  • 21,159
  • 2
  • 9
  • 26
1

you also can use like atribute of filter function:

imported_df.filter(like='attach')
'''
  attachment attachment.1
0    lol.jpg    lmfao.png
1   cat.docx          NaN
SergFSM
  • 1,419
  • 1
  • 4
  • 7