I have generated a .txt file containing hundreds of rows. Each of the rows looks like this:
0000000000000000000000000000000000000000100000000010000000001000000000100000000010000000000000000000
Now I have an input function where an string like this gets submitted:
0000000000000000000000000000000000000000000011000010000000001000000000100000000010000000000000000000
If the input string has a 1 where a string from the .txt has a 0 it doesn't matter. The question the algorithm should solve is following:
- find the "best matching" rows in the .txt where input string has as much 1s at the same indices the row has
- output at which indices theses "best matching rows" have 1s and the input hasn't
Circumstances:
- There are a maximum of five 1s in each row of the .txt
- There can be up to a hundred 1s in the input string.
- Each row and the input are exactly 100 characters long.
- Each row and the input will always only contain 0 or 1; no other characters.
Here is an example that's smaller than the real problem assuming length 6 with max 3 ones (I will count indices starting from 1 not 0 for this example):
rows.txt
000111
010101
110010
001011
input string:
001110
Expected output of the algorithm
Closest matching rows are:
Row 1: two matching 1s; your input misses a 1 on index 6
Row 4: two matching 1s; your input misses a 1 on index 4
Question / possible solutions:
I choose 0s and 1s to represent the information I need as I planned on putting them in a binary file. I was expecting better performance and lower file size when using binary representation and unsigned integers in the code (Java/Kotlin). However to go on uInt8; I would need to add "0000" on each line to make each line 104 characters long which is dividable by 8. That's surely possible, but than I have everything saved as integers and I just can't get my head around on how to compare these on a byte-representation level to basically do the pattern matching I need in the code.
Looking for the best performing storage/pattern matching solution for the given problem.
Here is why I am doing this: I have a frontend with a clickable grid. An unckecked item is represented by a 0. Clicking on a item and thus selecting it is represented by a 1. Users can select up to 50 of the 100 available items. However, there some items depend on other items. Every item always depends on 4 other items. These are the rows of the (currently) .txt file. And that's why I need to figure out what ones are missing to recommend a configuration that has a full dependency of five 1s working while I don't need to care if there are additional selections (1s). Maybe this use-case clarifies how an abstract problem like this is a real world problem and not a school assignment. Prior solution involved representing the grid by a matrix, but having a binary string seemed to be more efficient.
Clarifying on the problem: Binary three:
00000011
Binary thirteen:
00001101
If there is a row representing binary 3 and the input is binary 13, how could I compare
val row: uint8 = 3u
val input: uint8 = 13u
to get the result
missing value 2 (-> a one on index 7