0

I need to automate the process of shifting data around in a dataframe to justify all the valuable information (i.e. not NaN) to the topmost empty cell. The catch here is that I don't know off the cuff what my input looks like until the user inputs it. Thus, I may have NaN's embedded anywhere in the data above, below, or between data sets.

What I have:

 a  b   c   d   e   f   g   h   i   j   k   l   m   o
 0  0   0   0   0   0   0   NaN NaN NaN NaN NaN NaN NaN
 1  1   1   1   1   1   1   NaN NaN NaN NaN NaN NaN NaN
 0  0   0   1   0   0   1   NaN NaN NaN NaN NaN NaN NaN
 0  0   1   0   1   0   0   NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN 1   1   0   1   0   0   1

Desired Output:

a   b   c   d   e   f   g   h   i   j   k   l   m   o
0   0   0   0   0   0   0   1   1   0   1   0   0   1
1   1   1   1   1   1   1   NaN NaN NaN NaN NaN NaN NaN
0   0   0   1   0   0   1   NaN NaN NaN NaN NaN NaN NaN
0   0   1   0   1   0   0   NaN NaN NaN NaN NaN NaN NaN

What I want to do is to efficiently (ie. not line by line) shift the values in row 4 columns h to o up to the top. I was able to hardcode this with the following lines of code:

> df= df.set_index(['a', 'b', 'c', 'd', 'e', 'f','g']).shift(-4).iloc[0:5].reset_index()

> df= df.reindex(np.arange(4))

The problem with hard coding is, as I said above, you don't know what the dataset looks like until the user inputs it. As well, the application for this would be data with over 100k rows of data.

Any suggestions as to how I may automate the shifting process? Preferably using pandas.

D.I.
  • 31
  • 6
  • See the section of my answer below containing "toward better performance". – cs95 May 16 '18 at 20:39
  • That method would still require a "hard coded" type solution though, no? I was hoping for more of an automated solution here like "here's the documentation on this obscure pandas function that can auto-justify when you play with this parameter" – D.I. May 16 '18 at 20:47
  • Hah... pandas isn't _that_ smart... ;-/ this isn't nearly a common enough use case to justify its own pandas function (pun intended). – cs95 May 16 '18 at 20:49
  • [This](https://stackoverflow.com/questions/33530601/pandas-force-nan-to-bottom-of-each-column-at-each-index) seems pretty similar to what you want – DJK May 16 '18 at 20:57
  • @COLDSPEED your answer on the other forum was great. Thanks for redirecting. – D.I. May 17 '18 at 15:48
  • @DJK the only problem there was that I had to work with **massive** datasets and your answer required a bit of manual work – D.I. May 17 '18 at 15:48

0 Answers0