I have a pandas dataframe with data from a university. I would like to store this in my Postgres DB using the to_sql method, but to do so I would have to flatten it into 1 row.
I would like the DF above to look like this (both pictures are the same row but broken up to fit on the screen):
How can I achieve this?
Data:
CSV of original dataframe
,0,1,2,3,4
0,,FULL-TIME,FULL-TIME,PART-TIME,PART-TIME
1,,Men,Women,Men,Women
2,Undergraduates,Undergraduates,Undergraduates,Undergraduates,Undergraduates
3,"Degree-seeking, first-time freshmen","1,012","1,089",,
4,"Other first-year, degree-seeking",57,58,,
5,All other degree-seeking,"2,709","2,720",,
6,Total degree-seeking,"3,778","3,867",0,0
7,All other undergraduates enrolled in credit courses,,,,
8,Total undergraduates,"3,778","3,867",0,0
Dataframe construction for desired output:
pd.DataFrame({
'FULL-TIME Men Undergraduates Degree-seeking, first-time freshmen': [1012],
'FULL-TIME Women Undergraduates Degree-seeking, first-time freshmen': [1089],
'PART-TIME Men Undergraduates Degree-seeking, first-time freshmen': [None],
'PART-TIME Women Undergraduates Degree-seeking, first-time freshmen': [None],
'FULL-TIME Men Undergraduates Other first-year, degree-seeking': [57],
'FULL-TIME Women Undergraduates Other first-year, degree-seeking': [58],
'PART-TIME Men Undergraduates Other first-year, degree-seeking': [None],
'PART-TIME Women Undergraduates Other first-year, degree-seeking': [None],
'FULL-TIME Men Undergraduates All other degree-seeking': [2709],
'FULL-TIME Women Undergraduates All other degree-seeking': [2720],
'PART-TIME Men Undergraduates All other degree-seeking': [None],
'PART-TIME Women Undergraduates All other degree-seeking': [None],
'FULL-TIME Men Undergraduates Total degree-seeking': [3778],
'FULL-TIME Women Undergraduates Total degree-seeking': [3867],
'PART-TIME Men Undergraduates Total degree-seeking': [None],
'PART-TIME Women Undergraduates Total degree-seeking': [None],
'FULL-TIME Men Undergraduates Total undergraduates': [3778],
'FULL-TIME Women Undergraduates Total undergraduates': [3867],
'PART-TIME Men Undergraduates Total undergraduates': [None],
'PART-TIME Women Undergraduates Total undergraduates': [None],
})
EDIT 1:
Each cell in the original becomes a cell in the single row produced. The first 3 rows in the CSV actually make a MultiIndex [(full-time, men, undergraduates), (full-time, women, undergraduates), (part-time, men, undergraduates), (part-time, women, undergraduates)]
. The first Column (0) is also an index. I'm essentially combining the indexes and flattening all values into a wide format.
Here is an image of the original table, I am extracting the data from PDF images.
To reproduce this exactly:
pd.read_csv('path/to/example.csv', header=[1,2,3], index_col=[1]).drop([('0','1','2')], axis=1)
Which then produces: