I'm having trouble describing the way my data is structured, hopefully this image will help.
As you can see I have 3 columns: Name, Corp #, and Dissagregation Category, that together serve as a unique identifier for each row. There are two super headers, "English/Language Arts Scores", and "Mathematics Scores." I want to create two separate dataframes for each header. This is what I've managed so far:
df1 = pd.read_excel(file, header=None)
vals = df1.values
mux = pd.MultiIndex.from_arrays(df1.ffill(1).values[:2, 1:], names=[None, 'Name'])
df = pd.DataFrame(df1.values[2:, 1:], df1.values[2:, 0], mux)
This generates a dataframe with two levels: df['English/Language Arts Scores']
and df['Mathematics Scores']
, that contain the columns under the header and "Name" as the index. I want these dataframes to include: Name, Copr #, and Dissagregation Category as columns or as the index.
What would be the best way to go about doing this?
EDIT
Here is a copy and pasteable snippet of my data:
English Math
A B C X Y X Y
ADAMS CENTRAL 0015 All Students 83 590 83 579
ADAMS CENTRAL 0015 General 1 0 *** 0 ***
ADAMS CENTRAL 0015 Total Gene 71 590 71 579
ADAMS West 0016 All Students 93 440 83 765
ADAMS West 0016 General 1 1 33 0 660
ADAMS West 0016 Total Gene 31 *** 46 572
The output that I want should look like this:
In [1]: df['English']
Out[1]:
A B C X Y
ADAMS CENTRAL 0015 All Students 83 590
ADAMS CENTRAL 0015 General 1 0 ***
ADAMS CENTRAL 0015 Total Gene 71 590
ADAMS West 0016 All Students 93 440
ADAMS West 0016 General 1 1 33
ADAMS West 0016 Total Gene 31 ***
In [2]: df['Math']
Out[2]:
A B C X Y
ADAMS CENTRAL 0015 All Students 83 579
ADAMS CENTRAL 0015 General 1 0 ***
ADAMS CENTRAL 0015 Total Gene 71 579
ADAMS West 0016 All Students 83 765
ADAMS West 0016 General 1 0 660
ADAMS West 0016 Total Gene 46 572