I have a .CSV
file that has the following structure (when viewed on Excel), in which it has unique section headers (strings) occupying a single cell, followed by a block with column names and data.
This format repeats itself throughout the Excel spreadsheet. Note that each section has different col names and number, and variable number of rows.
Example:
Daily Statements
Date Desc Customer ID Phone Status
12/21/21 aaa 1 123-123-1231 OK
12/21/21 aaa 2 333-123-1231 OK
12/21/21 bbb 3 222-123-1231 OK
12/21/21 bbb 3 444-123-1231 OK
<===== one empty row separates sections
Account History
Date Time Type Ref # Balance
12/21/21 1:00:00 BAL 456 $0.01
12/21/21 1:00:00 BAL 445 $0.01
12/21/21 1:00:00 BAL 645 $0.01
<===== one empty row separates sections
Order History
ID Date Ref #
1 12/21/21 777
2 12/21/21 888
3 12/21/21 999
4 12/21/21 9995
My goal is to extract only the rows in Account History:
Date Time Type Ref # Balance
12/21/21 1:00:00 BAL 456 $0.01
12/21/21 1:00:00 BAL 445 $0.01
12/21/21 1:00:00 BAL 645 $0.01
However I'm unable to find an approach that would work in Pandas, as I'd need to use the string "Account History" as an anchor to indicate the rows of interest.
Do you know how this could be achieved?