I have this kind of data that it's driving me crazy. The source is a pdf file that I read with tabula to extract tables. Problem is that some rows of the table are multiline in the document and this is how I see my output.
> sub_df.iloc[85:95]
1 Acronym Meaning
86 ABC Aaaaa Bbbbb Ccccc
87 CDE Ccccc Ddddd Eeeee
88 NaN Fffff Ggggg
89 FGH NaN
90 NaN Hhhhh
91 IJK Iiiii Jjjjj Kkkkk
92 LMN Lllll Mmmmm Nnnnn
93 OPQ Ooooo Ppppp Qqqqq
94 RST Rrrrr Sssss Ttttt
95 UVZ Uuuuu Vvvvv Zzzzz
What I would like to get is something like this.
> sub_df.iloc[85:95]
1 Acronym Meaning
86 ABC Aaaaa Bbbbb Ccccc
87 CDE Ccccc Ddddd Eeeee
88 FGH Fffff Ggggg Hhhhh
91 IJK Iiiii Jjjjj Kkkkk
92 LMN Lllll Mmmmm Nnnnn
93 OPQ Ooooo Ppppp Qqqqq
94 RST Rrrrr Sssss Ttttt
95 UVZ Uuuuu Vvvvv Zzzzz
I am struggling with combine_first like this:
sub_df.iloc[[88]].combine_first(sub_df.iloc[[87]])
but the result is not what I am expecting.
Also a solution with groupby would be appreciated.
Note: index is not important and it can be reset. I just wanna join some consecutive rows whose columns are NaN and then dump it to csv, so I don't need them.