I have 3 large text files (>16 M lines each) in the format below.
Contents of file 1:
22_0F3, 33_0F4, 0.87
28_0F3, 37_0F4, 0.79
21_0F5, 39_2F1, 0.86
Contents of file 2:
22_0F3, 33_0F4, 1000
28_0F3, 37_0F4, 1500
21_0F2, 52_2F8, 3600
Contents of file 3:
22_0F3, 33_0F4, 0.75
28_0F3, 37_0F4, 0.91
81_0F2, 32_2F1, 0.84
I'm trying to extract the common lines based on 1st two fields from these 3 files.
Then I have to find the square root of the squares of each corresponding value from the 3rd column in each line (explained below).
The difficulty is that since these text files are really huge with more than 16 million lines, it is taking more time to load and extract common lines.
The common lines would be around 15M based on the data I have.
Intermediate output is something like this:
22_0F3, 33_0F4, 0.87, 1000, 0.75
28_0F3, 37_0F4, 0.79, 1500, 0.91
The desired output is:
22_0F3, 33_0F4, 1000.7575
28_0F3, 37_0F4, 1500.6245
where 1000.7575
is the square root of sum of squares of 0.87
, 1000
, and 0.75
.
How can I get the desired output from these huge files without much delay?