I need to automate the process of shifting data around in a dataframe to justify all the valuable information (i.e. not NaN) to the topmost empty cell. The catch here is that I don't know off the cuff what my input looks like until the user inputs it. Thus, I may have NaN's embedded anywhere in the data above, below, or between data sets.
What I have:
a b c d e f g h i j k l m o
0 0 0 0 0 0 0 NaN NaN NaN NaN NaN NaN NaN
1 1 1 1 1 1 1 NaN NaN NaN NaN NaN NaN NaN
0 0 0 1 0 0 1 NaN NaN NaN NaN NaN NaN NaN
0 0 1 0 1 0 0 NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN 1 1 0 1 0 0 1
Desired Output:
a b c d e f g h i j k l m o
0 0 0 0 0 0 0 1 1 0 1 0 0 1
1 1 1 1 1 1 1 NaN NaN NaN NaN NaN NaN NaN
0 0 0 1 0 0 1 NaN NaN NaN NaN NaN NaN NaN
0 0 1 0 1 0 0 NaN NaN NaN NaN NaN NaN NaN
What I want to do is to efficiently (ie. not line by line) shift the values in row 4 columns h to o up to the top. I was able to hardcode this with the following lines of code:
> df= df.set_index(['a', 'b', 'c', 'd', 'e', 'f','g']).shift(-4).iloc[0:5].reset_index()
> df= df.reindex(np.arange(4))
The problem with hard coding is, as I said above, you don't know what the dataset looks like until the user inputs it. As well, the application for this would be data with over 100k rows of data.
Any suggestions as to how I may automate the shifting process? Preferably using pandas.