Suppose I have two dataframes which look like this:
df1
ID Chr
1 a
2 a
3 a
4 a
5 a
6 a
7 b
8 b
9 b
10 b
11 c
12 c
13 a
14 a
15 a
16 a
17 c
18 c
19 c
20 a
df2
ID Chr
1 a
2 a
3 b
4 b
5 b
6 b
7 b
8 b
9 b
10 b
11 c
12 c
13 a
14 a
15 c
16 c
17 c
18 a
19 a
20 a
If you look at the two dfs you can see that they are quite similar. In fact if it is like this i consider them part of the same set. But the issue is that they are not aligned too well. In this small sample it might not seem like a big deal but with the actual data with more than 1000 rows the alignment is a big problem.
The issue is that my matching algorithm is pretty basic and compares one row of the df1
to the corresponding row of df2
and gives a score of 1 is there is a match and 0 for a mismatch. What complicates the issue is that I'm not matching all the rows of the dataframes at once either. Due to the circumstances I have to do partial matches. For example with the above data I would match by 5 rows. The first five rows of df1
against five rows of df2
. When I minimize the scale the issue becomes worse.
So the question is can I do something about the alignment without having to resort to matching the entire dfs at once.