I have this table:
itemA itemB relation
1 2 0.4
1 3 0.8
2 1 0.4
2 3 0.6
3 1 0.8
3 2 0.6
It's 10m rows table and counting, and as you see, there's lots of duplicate information. I use this table like this:
SELECT itemB WHERE itemA=1 ORDER BY relation DESC LIMIT 1
// in this case - 3
Only in real life my limit is not 1 but 500, so I get the 500 items that have the highest 'relation' value with the item I query for.
I'm considering narrowing down the table and remove duplicates:
itemA itemB relation
1 2 0.4
1 3 0.8
2 3 0.6
this will reduce table size by 50%. but then I need a query that goes like this -
"look for item x in itemA AND itemB, and when you find it, get its pair. when you have all pairs, sort them by their relation with item x, and select the top 500."
A. Do you think this will actually make the table usage more efficient?
B. How should this query look like?