Great question!
Let's start with a possible solution
I assume you want to deduct the total of the first from the total of the second per group. Taking your cleaning as the basis, here's a small, (hopefully) complete example, which uses .sum() and multiplies the views from b by -1 prior to grouping:
import pandas as pd
import numpy as np
a = pd.DataFrame(data = [
[100, 'x.me'], [200, 'y.me'], [50, 'x.me'], [np.nan, 'y.me']
], columns=['VIEWS', 'URL'])
b = pd.DataFrame(data = [
[90, 'x.me'], [200, 'z.me'],
], columns=['VIEWS', 'URL'])
for x in [a, b]:
x['VIEWS'] = pd.to_numeric(x['VIEWS'], errors='coerce').fillna(0).astype(int)
df = pd.concat([x.groupby(['URL'])['VIEWS'].apply(lambda y: y.sum() * (1 - 2 * cnt)).reset_index(drop = False) for (cnt, x) in enumerate([a, b])], ignore_index=True)
df = df.groupby(['URL'])['VIEWS'].sum().abs().reset_index()
A few words on why your approach is currently not working
- diff() There is a diff function for the SeriesGroupBy class. It takes the difference of some row to the previous row in the group. Check this out for a valid usage of diff() in this context: Pandas groupby multiple fields then diff
- nan's appear in your last operation since you're trying to set a series object with the indices being the urls onto a series with completely different indices.
So if anything, an operation such as the following could work
df['VIEWS'] = df.groupby(['URL'])['VIEWS'].sum().reset_index(drop=True)
although this still assumes, that df does not change in size and that the indices on the left side accord the ones after the reset on the right side.