0

Instead of manually convert csv file to text tab delimited file using excel software

enter image description here

I would like to automate this process using Python.

However, using the following code

with open('endnote_csv.csv', 'r') as fin:
       with open('endnote_deliminated.txt', 'w', newline='') as fout:
           reader = csv.DictReader(fin, delimiter=',')
           writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
           writer.writeheader()
           writer.writerows(reader)

Return an error of

ValueError: dict contains fields not in fieldnames: None

May I know where did I do wrong,

The csv file is accessible via the following link

Thanks in advance for any insight.

mpx
  • 3,081
  • 2
  • 26
  • 56
  • your source file is not a proper .csv file. It has tons of commas in the middle of the entries. – AirSquid Jul 06 '20 at 16:57
  • 1
    this may help out if you want to avoid pandas: https://stackoverflow.com/questions/21527057/python-parse-csv-ignoring-comma-with-double-quotes – AirSquid Jul 06 '20 at 17:04

2 Answers2

2

You can use the Python package called pandas to do this:

import pandas as pd
fname = 'endnote_csv'
pd.read_csv(f'{fname}.csv').to_csv(f'{fname}.tsv', sep='\t', index=False)

Here's how it works:

  • pd.read_csv(fname) - reads a CSV file and stores it as a pd.DataFrame object (not important for this example)

  • .to_csv(fname) - writes a pd.DataFrame to a CSV file given by fname

  • sep='\t' - replaces the ',' used in CSVs with a tab character

  • index=False - use this to remove the row numbers

If you want to be a bit more advanced and use the command line only, you can do this:

# csv-to-tsv.py
import sys

import pandas as pd

fnames = sys.argv[1:]

for fname in fnames:
    main_name = '.'.join(fname.split('.')[:-1])
    pd.read_csv(f'{main_name}.csv').to_csv(f'{main_name}.tsv', sep='\t', index=False)

This will allow you to run a command like this from the command line and change all .csv files to .tsv files in one go:

python csv-to-tsv.py *.csv
d-man
  • 476
  • 4
  • 24
  • Thanks for the suggestion, I will try this and come back to you. – mpx Jul 07 '20 at 06:11
  • Hi, apparently, this save the doc as tsv instead of txt format – mpx Jul 08 '20 at 03:41
  • 1
    just change everywhere it says `'.tsv'` to `'.txt'` - for example: `pd.read_csv(f'{main_name}.csv').to_csv(f'{main_name}.txt', sep='\t', index=False)` – d-man Jul 08 '20 at 13:05
0

It is erroring out on comma seperated author names. It appears that columns in the underline rows exceeds number of headers.