Consider this data:
>>> import pandas as pd
>>>
dfA = pd.DataFrame({
"key":[1,3,6,10,15,21],
"columnA":[10,20,30,40,50,60],
"columnB":[100,200,300,400,500,600],
"columnC":[110,202,330,404,550,606],
})
dfB = pd.DataFrame({
"key":[1,2,5,9,14,20],
"columnA":[40,50,60,70,80,90],
"columnB":[400,500,600,700,800,900],
"columnC":[440,505,660,707,880,909],
})
>>> dfA
key columnA columnB columnC
0 1 10 100 110
1 3 20 200 202
2 6 30 300 330
3 10 40 400 404
4 15 50 500 550
5 21 60 600 606
>>> dfB
key columnA columnB columnC
0 1 40 400 440
1 2 50 500 505
2 5 60 600 660
3 9 70 700 707
4 14 80 800 880
5 20 90 900 909
I basically want to extract range of rows where "key" column is between 2 and 16 from both tables, and show this "subrange" only for "key", "ColumnA" and "ColumnC" columns - side-by-side, so as to make a visual comparison.
Via Extract rows range with .between(), and specific columns, from Pandas DataFrame? and Merge Pandas dataframes on column, and have result sorted by same column I ended up with the following (sort of) one-liner:
>>>
pd.merge(
dfA[dfA['key'].between(2,16)][['key','columnA','columnC']],
dfB[dfB['key'].between(2,16)][['key','columnA','columnC']],
on='key', how='outer'
).astype('Int64').sort_values('key', ignore_index=True)
key columnA_x columnC_x columnA_y columnC_y
0 2 <NA> <NA> 50 505
1 3 20 202 <NA> <NA>
2 5 <NA> <NA> 60 660
3 6 30 330 <NA> <NA>
4 9 <NA> <NA> 70 707
5 10 40 404 <NA> <NA>
6 14 <NA> <NA> 80 880
7 15 50 550 <NA> <NA>
Essentially, this is the data I wanted to obtain - except, I'd like to know this:
- Is there a way to do the above, without having to type the row range (here,
.between(2,16)
) twice? - How can I format these results with additional "spanning" or "merged" headers (
MultiIndex
, as in Pandas to Excel (Merged Header Column) )? More precisely, the above shows column headings:
(index) key columnA_x columnC_x columnA_y columnC_y
... and what I'd want instead, is that original column name becomes a "category" heading for columns, and actual columns become "A" or "B" below the category - for all but the (merged) "key" column... so something like:
columnA columnC
(index) key A B A B
( So, this involves remapping (renaming) column names to a certain extent as well ... )
How can I do this? And can it be done in a one-liner, as in the above pd.merge example?