0

I have two very long files (more than 1 Million lines) of exactly the same number of lines and with lines corresponding to each other line by line. I want to read both files in parallel line by line and write a new file depending on the content of the lines.

To be more concrete, the first file looks like

<text id="Jamilja03" title="Жамиля" title_english="Jamilja" year="1959" genre="novelette" author="Chyngyz Aitmatov>
<s>
Жамийла
Ар
дайым
бир
жакка
жол
жүрөрдө
,
мен
ушул
алкагы
жөнөкөй
жыгачтан
жасалган
сүрөттүн
алдына
келип
турам
.
</s>

and the second file looks like

<^text/*text$ ^id/*id$=^"/"<quot>$^Jamilja03/*Jamilja03$^"/"<quot>$ ^title/*title$=^"/"<quot>$^Жамиля/*Жамиля$^"/"<quot>$ ^title/*title$_^englis/*english$=^"/"<quot>$^Jamilja/*Jamilja$^"/"<quot>$ ^year/*year$=^"/"<quot>$^1959/1959<num>$^"/"<quot>$ ^genre/*genre$=^"/"<quot>$^novelette/*novelette$^"/"<quot>$ ^author/*author$=^"/"<quot>$^Chyngyz/Chyngyz<np><unk>$ ^Aitmatov/*Aitmatov$>
<^s/*s$>
^Жамийла/*Жамийла$
^Ар дайым/ар дайым<adv>$
^бир/бир<num>$
^жакка/жак<n><dat>$
^жол/жол<adv>$
^жүрөрдө/жүр<v><iv><ger_fut><loc>$
^,/,<cm>$
^мен/мен<prn><pers><p1><sg><nom>$
^ушул/ушул<det><dem>$
^алкагы/алкак<n><px3sp><nom>$
^жөнөкөй/жөнөкөй<adj>$
^жыгачтан/жыгач<n><abl>$
^жасалган/жаса<v><tv><pass><prc_past>$
^сүрөттүн/сүрөт<n><gen>$
^алдына/алд<n><px3sp><dat>$
^келип/кел<v><iv><prc_perf>$
^жүрөрдө/жүр<v><iv><ger_fut><loc>$
^,/,<cm>$
^мен/мен<prn><pers><p1><sg><nom>$
^ушул/ушул<det><dem>$
^алкагы/алкак<n><px3sp><nom>$
^жөнөкөй/жөнөкөй<adj>$
^жыгачтан/жыгач<n><abl>$
^жасалган/жаса<v><tv><pass><prc_past>$
^сүрөттүн/сүрөт<n><gen>$
^алдына/алд<n><px3sp><dat>$
^келип/кел<v><iv><prc_perf>$
^турам/тур<vaux><aor><p1><sg>$
^./.<sent>$
<^///<sent>$^s/*s$>

I want to use the lines from the second file in general (with some reformatting), but to keep the XML markup in some lines containing XML tags from the first file for XML tags.

A naive approach like

for line_a in file_a and line_b in file_b:

does not work with python.

There is already a question with a similar title, namely How to read two files in parallel line by line in python but the proposed answers (read one file into a list or dictionary) don't fit my task. I really want to read the lines from the two files and than decide on the further processing, and to forget them afterwards.

Sir Cornflakes
  • 675
  • 13
  • 26
  • This might help https://stackoverflow.com/questions/11295171/read-two-textfile-line-by-line-simultaneously-python – bumblebee Feb 05 '19 at 10:39

1 Answers1

3

Use zip over the files which are already generators:

for la, lb in zip(file_a, file_b):
    ...
Netwave
  • 40,134
  • 6
  • 50
  • 93