0

As a start, I want to compare the first two columns of two .csv files then write what is common in these files to an output file, say common.csv, then also write the differences in each file to different output files, say f1.csv and f4.csv.

So far I have tried to using set(), difflib, and also taking the two files, create lists from the files then comparing the first two columns in each file. This gave me the output for what is common but not for what the differences are in each file when compared to each other. I have tried most of the solutions posted that seemed like the problem was similar to mine but I am still stuck. Can someone please assist?

this is the headers in my files and only want to compare the first two columns but write out the entire line to the output file.

fieldnames = (["Chromosome" ,"GenomicPosition", "ReferenceBase",
               "AlternateBase", "GeneName", "GeneID",
               "TrancriptID",   "Varianteffect-Variantimpact",
               "Biotype",   "TranscriptBiotype" ,   "Referencebase",
               "Alternatebase", "Depth coverage"])
Derlin
  • 9,572
  • 2
  • 32
  • 53

1 Answers1

0

One solution is to use pandas, which is very powerful.

To convert csv <-> pandas dataframes:

 import pandas as pd
 df = pd.read_csv('csv_file.csv') # csv -> pandas
 df.to_csv('csv_file.csv', index=False) # pandas -> csv

To compare pandas dataframes on columns, this post should point you in the right direction: https://stackoverflow.com/a/47107164/2667536

Derlin
  • 9,572
  • 2
  • 32
  • 53