Seeking Help for Complex SQL problem: 2d array data Deduplication and Partial Search

Question

Struggling with a challenge in my current project. This project deals with 2D arrays, similar in structure to a tic-tac-toe game (which I'm going to use as a simplified example). The main issues I'm wrestling with are how to save a 2d array to the PostgreSQL database while eliminating duplicates and how to implement a partial puzzle search. In an attempt to simplify my explanation, I'll use a simplified tic-tac-toe analogy to outline my two primary challenges:

1. 2d Array Deduplication: Similar to how a tic-tac-toe board can be rotated or flipped. You can get 8 different but equivalent 2d array state (4 rotations and 2 mirrors), I consider these duplicates.What would be the most efficient way to deal with the issue?

Right now, my main idea is to create a standard "canonical" form to which all variations of a board state would be converted before storing it in the PostgreSQL database. But I'm still trying to find an efficient process for selecting a canonical form. I've considered a couple of options:

Generating all 8 variations of the 2d array, converting these into strings or hashes, and selecting the canonical form based on lexicographical value.
Assessing the first and last occurrences of "X" and "O" in the board state, and using this data to identify the board's current orientation somehow. Then rotating/mirroring it to get to a canonical form.

2. Partial 2d Array Search: In this context, I'm looking to develop a way for users to search for arrays within our database that closely correspond to a specific arrangement they create, similar to a tic-tac-toe board setup. The user's input becomes the search criterion.

Users will set up pieces on a virtual board on the website, similar to the "X"s and "O"s in a tic-tac-toe game. The goal is to then search the database and find arrays with similar configurations and return these as search results.

The crucial issue to address here is how to execute this search without a linear scan of the entire database, which is both time-consuming and computationally intensive.

THE ONLY idea I currently have, is to create a separate table, where each row is a non-empty element of the array (with a foreign key to an array table). where each non-empty element. Its x and y value get indexed. This index would then establish a link with the unique ID associated with the array. So with these idea, we have 2 tables: 1) ArrayTable, that store info about array and let's say 2) ArrayElementTable, in which we store actual array element separately.

When a search is initiated, the pieces that the user has placed on the web-based board are compared to the indexed elements stored in this database table.

The result is a list of matching rows from ArrayTable for every searched ArrayElement. And then based on these lists, we can somehow calculate a similarity percentage to the initial user input. Therefore, it's about producing an output like: '30% match with Array A, 50% match with Array B, 100% match with Array C', and so on.

However, I have a feeling there could be a more efficient way of accomplishing this. And even accomplishing the idea that I explained is pretty complex. So I would love a little help in designing it.

I'd appreciate any advanced insights or alternative perspectives on these matters!

**THE ONLY idea ...** you have is called data normalization and **is the preferred option** in a relational database. — Belayer, Aug 02 '23 at 18:15

score 0 · Answer 1 · answered Aug 02 '23 at 12:53

Maybe postgreSQL's trigram style text indexing could be helpful.

Can you represent your arrays as text strings? For example,

XOX
OXO
OXO

would be stored in a single VARCHAR or TEXT column, let's call it grid, as something like this.

XOX OXO OXO

Then, when your user presents input you would transform that input, in your software, to all eight symmetry-group versions, then deduplicate it, then for each deduplicated symmetry group version do a query with the SQL LIKE operator.

WHERE grid LIKE '%OXO%' -- OXO is a chunk of transformed input.

The trigram indexing supports the wildcard % querying and makes it performance-feasible.

Just an idea.

Seeking Help for Complex SQL problem: 2d array data Deduplication and Partial Search

1 Answers1