1

I have a csv file. each value is quoted with """ quotes . I want to remove it for further processing

here is my csv file

Name,age,class,place
""""ishika""","""21""","""B"""","""Whitefield"""
"""anju""","""23""","""C""","""ITPL"""

I want the output as:

Name,age,class,place
ishika,21,B,Whitefield
anju,23,C,ITPL

I am getting csv form postgres table..

import psycopg2
import config as cfg
conn = cfg.DATABASE_CONNECT
cur = conn.cursor()
import csv
import pandas as pd
import numpy as np

tablename = "sf_paymentprofile_error_log"
query = "SELECT * from {} ".format(tablename)
outputquery = "COPY ({0}) TO STDOUT WITH CSV HEADER".format(query)
with open(cfg.PG_EXTRACT_PATH+'sf_paymentprofile_error_log.csv', 'w') as f:
   cur.copy_expert(outputquery, data)


conn.commit()
conn.close()

I want the above output using python.Thanks.

Lante Dellarovere
  • 1,838
  • 2
  • 7
  • 10
ishika
  • 47
  • 5

3 Answers3

0

a way of doing it using pandas

import pandas as pd

df = pd.read_csv("your_file.csv")


for i in df.columns :         
    df[i] = df[i].apply(lambda x: str(x).replace('"',''))

df.to_csv("output.csv",index=False)

if it is a list :

output = []
for row in your_data :  
    b = []
    for val in row : 
        b.append(val.replace('"',''))
    c.append(b) 

print(output)

nassim
  • 1,547
  • 1
  • 14
  • 26
  • i am getting csv from postgres table – ishika May 07 '19 at 10:26
  • there are various ways reading your table with pandas, df = pd.read_sql_query('select * from "table"',con=engine) , all that matters is your resulting dataframe on which you will have to do the preprocessing – nassim May 07 '19 at 10:29
0

Remove them by treating them as quotes but csv only accepts one-char separator so:

import re
with open('data.csv') as f:
    # replace """ to single "
    data = (re.sub(r'"+', '"', line) for line in f.readlines())
    # now treat it as normal csv
    rd = csv.reader(data, delimiter=',', quotechar='"')
    # print 
    for row in rd:
        print(','.join(row))

Or if you find it's safe, do re.sub('"', '', f.read()) for the whole file.

knh190
  • 2,744
  • 1
  • 16
  • 30
  • @ishika it's the same, use `lines = csv_str.split('\n')`. The csv content string cane be read by executing the query. Considering you are using postgres, you should already know the basic file & string operations though. – knh190 May 07 '19 at 11:12
0

Both pd.str.replace and pd.str.strip will help, for example:

df.apply(lambda x: x.str.strip('"'))

Anyway, some of the rows of your csv have a " succession that hides some , separator, so if I apply strip function:

import pandas as pd

df = pd.read_csv("my.csv")
df = df.apply(lambda x: x.str.strip('"'))
print(df)

     Name age            class place
0  ishika  21  B"","Whitefield   NaN
1    anju  23                C  ITPL

The first workaround I've found implies a change of the quotechar param:

import pandas as pd

df = pd.read_csv("my.csv", quotechar="'")
df = df.apply(lambda x: x.str.strip('"'))
print(df)

     Name age class       place
0  ishika  21     B  Whitefield
1    anju  23     C        ITPL
Lante Dellarovere
  • 1,838
  • 2
  • 7
  • 10