My goal is to be able to iterate over all bitstrings read from a text file so I can compute the hamming distance between all combinations of the strings. For example, I have a txt file that contains 500 bitstrings, where each bitstring has a length of 5093. I would like to read strings s1 and s2 from the file, then compute the hamming distance between them. Essentially, I'm trying to iterate through the strings in the file to compute the HD for all 500*499/2 = 124,750 combinations so I can compute the mean, std dev, and plot a histogram. I was able to do this in python by using readlines() to read the strings and store them in a list. Then, use a for loop to iterate through all (s1) strings and compare them to the (s2) strings read from the list using a nested for loop. Now, I'm re-approaching the problem to brush up on my C. My current approach involves iterating through the file in a similar fashion and reading the bitstrings using two calls to fgets(), then stripping the carriage return. The problem I'm having is that when I try to call the second fgets() to get s2, the end of the bitstrings are cut ~300 characters short and I compute the hamming distance 499 times instead of 127,450 distance calculations that are expected. When I use fgets() once and comment out the nested while loop, I'm able to read the full bitstring. If you could help me understand the problem with my implementation and the proper approach to achieving my goal, it would be greatly appreciated. Thanks!
EDIT: Initialized the variables, and reset both i and hd for HD calculation. Provided a similar example of the txt file containing the bitstrings. In this example, there are 4 bitstrings of length 16 instead of 500 bitstrings of length 5093. In this case, the goal is to calculate the HD of all 6 combinations of bitstring pairs.
sample txt file
0011010000111010
1001001001110100
1110110010000100
0111011011111001
Code
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define SIZE 6000
#define CHIPS 500
int main(int argc, char *argv[]) {
FILE* fp;
char buffer[SIZE];
char s1[SIZE];
char s2[SIZE];
int i = 0, j = 0, hd = 0;
if(argc != 2) {
fprintf(stderr, "USAGE: ./<executable> <bitstring file>\n");
return 1;
}
else if ((fp = fopen(argv[1], "r")) == NULL) {
perror("ERROR: File not found.\n");
return 1;
}
/* for(i = 0; i < CHIPS; i++) {
fgets(s1,sizeof(s1),fp);
s1[strlen(s1) - 1] = '\0';
printf("%s\n", s1);
printf("%d\n", i);
for(j = 0; j < CHIPS; j++) {
fgets(s2, sizeof(s2),fp);
s2[strlen(s2) - 1] = '\0';
printf("%s\n", s2);
printf("%d", j);
}
}
fclose(fp);
*/
while(fgets(s1,sizeof(s1), fp) != NULL) {
//memcpy(s1,buffer, sizeof(s1));
s1[strlen(s1) - 1] = '\0';
printf("%s\n", s1);
while(fgets(s2, sizeof(s2), fp) != NULL) {
s2[strlen(s2) - 1] = '\0';
while(s1[i] != '\0') {
if(s1[i] != s2[i])
hd++;
i++;
}
printf("Hamming Distance: %d\n", hd);
i = 0;
hd = 0;
}
}
fclose(fp);
return 0;
}
Sample output
...
Hamming Distance: 2576