How to remove triple double-quotes around values in csv files?

Question

I have a csv file. each value is quoted with """ quotes . I want to remove it for further processing

here is my csv file

Name,age,class,place
""""ishika""","""21""","""B"""","""Whitefield"""
"""anju""","""23""","""C""","""ITPL"""

I want the output as:

Name,age,class,place
ishika,21,B,Whitefield
anju,23,C,ITPL

I am getting csv form postgres table..

import psycopg2
import config as cfg
conn = cfg.DATABASE_CONNECT
cur = conn.cursor()
import csv
import pandas as pd
import numpy as np

tablename = "sf_paymentprofile_error_log"
query = "SELECT * from {} ".format(tablename)
outputquery = "COPY ({0}) TO STDOUT WITH CSV HEADER".format(query)
with open(cfg.PG_EXTRACT_PATH+'sf_paymentprofile_error_log.csv', 'w') as f:
   cur.copy_expert(outputquery, data)


conn.commit()
conn.close()

I want the above output using python.Thanks.

in what data structure you're loading you data ? list , dataframe ? — nassim, May 07 '19 at 10:03
Use `.replace('"""', '')` to remove `"""` from string [How to use string.replace() in python 3.x](https://stackoverflow.com/a/9452122) — Xarvalus, May 07 '19 at 10:05
doing replace i get the following error. AttributeError: '_io.TextIOWrapper' object has no attribute 'replace' — ishika, May 07 '19 at 10:08

nassim · Answer 1 · 2019-05-07T10:14:13.547

0

a way of doing it using pandas

import pandas as pd

df = pd.read_csv("your_file.csv")


for i in df.columns :         
    df[i] = df[i].apply(lambda x: str(x).replace('"',''))

df.to_csv("output.csv",index=False)

if it is a list :

output = []
for row in your_data :  
    b = []
    for val in row : 
        b.append(val.replace('"',''))
    c.append(b) 

print(output)

edited May 07 '19 at 10:14

answered May 07 '19 at 10:07

nassim

1,547
1
14
26

i am getting csv from postgres table – ishika May 07 '19 at 10:26
there are various ways reading your table with pandas, df = pd.read_sql_query('select * from "table"',con=engine) , all that matters is your resulting dataframe on which you will have to do the preprocessing – nassim May 07 '19 at 10:29

score 0 · Answer 2 · answered May 07 '19 at 10:15

0

Remove them by treating them as quotes but csv only accepts one-char separator so:

import re
with open('data.csv') as f:
    # replace """ to single "
    data = (re.sub(r'"+', '"', line) for line in f.readlines())
    # now treat it as normal csv
    rd = csv.reader(data, delimiter=',', quotechar='"')
    # print 
    for row in rd:
        print(','.join(row))

Or if you find it's safe, do re.sub('"', '', f.read()) for the whole file.

answered May 07 '19 at 10:15

knh190

2,744
1
16
30

@ishika it's the same, use `lines = csv_str.split('\n')`. The csv content string cane be read by executing the query. Considering you are using postgres, you should already know the basic file & string operations though. – knh190 May 07 '19 at 11:12

Lante Dellarovere · Answer 3 · 2019-05-07T11:14:45.297

Both pd.str.replace and pd.str.strip will help, for example:

df.apply(lambda x: x.str.strip('"'))

Anyway, some of the rows of your csv have a " succession that hides some , separator, so if I apply strip function:

import pandas as pd

df = pd.read_csv("my.csv")
df = df.apply(lambda x: x.str.strip('"'))
print(df)

     Name age            class place
0  ishika  21  B"","Whitefield   NaN
1    anju  23                C  ITPL

The first workaround I've found implies a change of the quotechar param:

import pandas as pd

df = pd.read_csv("my.csv", quotechar="'")
df = df.apply(lambda x: x.str.strip('"'))
print(df)

     Name age class       place
0  ishika  21     B  Whitefield
1    anju  23     C        ITPL

How to remove triple double-quotes around values in csv files?

3 Answers3