UPD: A dataframe has been pasted as an example at the bottom of the page.
My original xls file looks like this:
and I need two actions in order to make it look like the below:
Firstly, I need to fill in the empty row values with the values shown in the cell above them. That has been achieved with the following function:
def get_csv():
#Read csv file
df = pd.read_excel('test.xls')
df = df.fillna(method='ffill')
return df
Secondly, I have used stack
with set_index
:
df = (df.set_index(['Country', 'Gender', 'Arr-Dep'])
.stack()
.reset_index(name='Value')
.rename(columns={'level_3':'Year'}))
and I was wondering whether there is an easier way. Is there a library that transforms a dataframe, excel etc into the wanted format?
Original dataframe after excel import:
Country Gender Direction 1974 1975 1976
0 Austria Male IN 13728 8754 9695
1 NaN NaN OUT 17977 12271 9899
2 NaN Female IN 8541 6465 6447
3 NaN NaN OUT 8450 7190 6288
4 NaN Total IN 22269 15219 16142
5 NaN NaN OUT 26427 19461 16187
6 Belgium Male IN 2412 2245 2296
7 NaN NaN OUT 2800 2490 2413
8 NaN Female IN 2105 2022 2057
9 NaN NaN OUT 2100 2113 2004
10 NaN Total IN 4517 4267 4353
11 NaN NaN OUT 4900 4603 4417