Here is the code that works as expected.
From: Outputting difference in two Pandas dataframes side by side - highlighting the difference
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
DF1 = StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.11 False "Graduated"
113 Zoe NaN True " "
""")
DF2 = StringIO("""id Name score isEnrolled Comment
111 Jack 2.17 True "He was late to class"
112 Nick 1.21 False "Graduated"
113 Zoe NaN False "On vacation" """)
df1 = pd.read_table(DF1, sep='\s+', index_col='id')
df2 = pd.read_table(DF2, sep='\s+', index_col='id')
df_all = pd.concat([df1, df2],
axis='columns', keys=['First', 'Second'])
df_final = df_all.swaplevel(axis='columns')[df1.columns[1:]]
def highlight_diff(data, color='yellow'):
attr = 'background-color: {}'.format(color)
other = data.xs('First', axis='columns', level=-1)
return pd.DataFrame(np.where(data.ne(other, level=0), attr, ''),
index=data.index, columns=data.columns)
df_final.style.apply(highlight_diff, axis=None)
The only problem is that I do not want the first row (111) because there are no differences.
How do I select only changed rows without using the highlight_diff function? I want to return rows 112 and 113 side-by-side without highlighting as shown by Ted's answer.