1

A .csv file looks like this

col1, col2, col3
"a, b, c"
"d, e, f"

and I want to unwrap the rows from the quotation marks to read the file in pandas/Python correctly. As stated in this duplicate question in R, one could read the .csv data twice: 1) unwrap the data in the first column of an auxiliary dataframe and 2) recall the read function on the first column of the auxiliary dataframe.

What is the most elegant way of doing this in pandas?

iJup
  • 281
  • 2
  • 13

2 Answers2

0

You can turn off quoting by setting the quoting paramter to 3. Then remove all your quotations.

data = pd.read_csv('file.csv', quoting=3)
for col in data:
    data[col] = data[col].str.strip('"')
busybear
  • 10,194
  • 1
  • 25
  • 42
0

I guess I would

  1. Read the raw csv file as a string and make the replacements I need on it.
  2. Put the new string inside a buffer.
  3. Read the buffer as a csv file with pandas.

In code:

import io
import pandas as pd


with open('./csv_quotes.csv') as file:
    raw_csv = file.read()
    new_csv = raw_csv\
        .replace('\n\"', '\n')\ # replace "a,... => a,...
        .replace('\"\n', '\n')\ # replace ..., c" => ..., c
        .replace(', ', ',') # replace a, b, c => a,b,c
    sio = io.StringIO(new_csv) # 

    df = pd.read_csv(sio, sep=',')
    print(df.values)

Output:

[['a' 'b' 'c']
 ['d' 'e' 'f']]

NOTE: you could use regular expressions to make the replacements. Take a look at this answer.