-1

I need to compare two tsv files and save needed data to another file.

Tsv_file stands for imdb data file which contains id-rating-votes separated by tab stop and Tsv_file2 stands for my file which contais id-year separated by tab stop

All id's from tsv_file2 are in tsv_file, I need to save to zapis file data in format id-year-rating, where id from tsv_file2 matches id from tsv_file.

The problem is that code below is working but it is saving only one line into zapis file.

What can i improve to save all records?


for linia in read_tsv2:
    for linia2 in read_tsv:
        if linia[0] == linia2[0]:
            zapis.write(linia[0]+'\t')
            zapis.write(linia[1]+'\t')
            zapis.write(linia2[1])
dronikk
  • 39
  • 5
  • Add a new line character to your last line `zapis.write(linia2[1]+'\n')` to start writing to a new line after each row is complete. – LaBeaux Feb 09 '21 at 15:37

1 Answers1

0

It would have made life much simpler if you had provided actual examples of these tsv files instead of describing them. Here is a guide to ask a good question.

As I understand you have

 123456   4  15916
 123888   1  151687
 115945   5  35051

vs

 123456   1993
 123888   2013

and you want

 123456   1993   5
 123888   2013   1

There are multiple ways to cut this. I would use the SqLite function of whatever language you choose, load the data into two temp tables, and then make a query to get the data you want. It should be fairly trivial to join the two tables.

Edit: If the SQLite path is taken, there are plenty of good tutorials around. Working with SQLite in Python, How to import a CSV in SQLite (with Python).

I would do the following, in broad terms:

  • create a :memory: database in SQLite
  • create tables

(these steps can be omitted, and a database +table can be created by an SQLite editor in advance)

  • connect to DB
  • import each TSV file into a table
  • execute a query on the database
  • output the result

Python should have libraries for most of these tasks.

The exact same result can be done with other tools.

MyICQ
  • 987
  • 1
  • 9
  • 25