I have two files, similar to the ones below:
File 1 - with phenotype informations, the first column are the individual, the orinal file has 400 rows:
215 2 25 13.8354303 15.2841303
222 2 25.2 15.8507278 17.2994278
216 2 28.2 13.0482192 14.4969192
223 11 15.4 9.2714745 11.6494745
File 2 - with SNPs information, the original file has 400 lines and 42,000 characters per line.
215 20211111201200125201212202220111202005111102
222 20111011212200025002211001111120211015112111
216 20210005201100025210212102210212201005101001
223 20222120201200125202202102210121201005010101
217 20211010202200025201202102210121201005010101
218 02022000252012021022101212010050101012021101
And I need to remove from file 2 individuals that do not appear in the file 1, for example:
215 20211111201200125201212202220111202005111102
222 20111011212200025002211001111120211015112111
216 20210005201100025210212102210212201005101001
223 20222120201200125202202102210121201005010101
I could do this with this code:
awk 'NR==FNR{a[$1]; next}$1 in a{print $0}' file1 file2> file3
However, when I do my main analysis with the generated file the following error appears:
*** Error in `./airemlf90': free(): invalid size: 0x00007f5041cc2010 ***
*** Error in `./postGSf90': free(): invalid size: 0x00007fec4a04f010 ***
airemlf90 and postGSf90 are software. But when I use original file this problem does not occur. Does the command that I made to delete individuals is adequate? Another detail that did not say is that some individuals have identification with 4 characters, can be this the error?
Thanks