Here’s another (Discrete Optimization) way to model your problem.
Notation
View your grid as a ‘graph’ with n^2 nodes, and edges of length 1 (Edges connect two neighboring nodes.) Let the nodes be numbered 1:n^2. (For ease of notation, you can use a double array (x,y) to denote each node if you prefer.)
Decision Variables
There are k colors, one for each player (1 through 4). 0 is an unclaimed cell (white)
X_ik = 1 if player k has claimed node i. 0 otherwise.
To start out
X_i0 = 1 for all nodes i.
All nodes start out as white (0).
Neighboring sets: Two nodes i and j are ‘neighbors’ if they are adjacent to each other. (Any given node i can have at most 4 neighbors: Up down right and left.)
Edge variables:
We can now define a new set of edge variables Y_ijk
that connect two adjacent nodes (i and j) with a common color k.
Y_ijk = 1 if neighboring nodes i and j are both of color k. 0 Otherwise.
(That is, X_ik = X_jk) for non-zero k.
We now have an undirected graph. Checking for ‘closed patterns’ is the same as detecting cycles.
Detecting Cycles:
A simple DFS search will do, since we have undirected cycles. Start with each colored node i, and check for cycles. If a path leads you back to a visited node, cycles exist. You can award points accordingly.
Finally, one suggestion as you design the game. You can reward points according to the “longest cycle” you detect. The shortest cycle gets 4 points, one point for each edge (or one point for each node in the cycle) whichever works best for you.
1 1
1 1 scores 4 points
1 1 1
1 1 1 scores 6 points
1 1 1
1 1 1
1 1 scores 8 points
Hope that helps.