Remove double quotes in Pandas

Question

I have the following file:

"j"; "x"; y
"0"; "1"; 5
"1"; "2"; 6
"2"; "3"; 7
"3"; "4"; 8
"4"; "5"; 3
"5"; "5"; 4

Which I read by:

df = pd.read_csv('test.csv', delimiter='; ', engine='python')

Then I print print df and see:

   "j"  "x"  y
0  "0"  "1"  5
1  "1"  "2"  6
2  "2"  "3"  7
3  "3"  "4"  8
4  "4"  "5"  3
5  "5"  "5"  4

Instead, I would like to see:

How to remove the double quotes?

KcFnMi · Accepted Answer · 2017-06-18T15:20:32.867

7

I did it with:

rm_quote = lambda x: x.replace('"', '')

df = pd.read_csv('test.csv', delimiter='; ', engine='python', 
     converters={'\"j\"': rm_quote, 
                 '\"x\"': rm_quote})

df = df.rename(columns=rm_quote)

edited Jun 18 '17 at 15:20

answered Jun 18 '17 at 15:14

KcFnMi

5,516
10
62
136

omri_saadon · Answer 2 · 2017-06-18T14:24:48.113

2

You can pass the type as an argument to the read_csv function.

pd.read_csv('test.csv', delimiter='; ', engine='python', dtype=np.float32)

You can read more in read_csv

Also, you can use to_numeric function.

df = df.apply(pd.to_numeric)

edited Jun 18 '17 at 14:24

answered Jun 18 '17 at 14:09

omri_saadon

10,193
7
33
58

`convert_objects` is deprecated – Ted Petrou Jun 18 '17 at 14:16
@TedPetrou , Thanks, I've updated it to use the `to_numeric` method – omri_saadon Jun 18 '17 at 14:19
Why would you use `apply` here instead of just using the function itself. `pd.to_numeric(df, errors='ignore')`. Also, to_numeric is a function not a method. – Ted Petrou Jun 18 '17 at 14:20
I got `ValueError: The 'dtype' option is not supported with the 'python' engine`. and `ValueError: ('Unable to parse string', u'occurred at index "j"')`, for the alterntive aproach. – KcFnMi Jun 18 '17 at 14:44
1

@KcFnMi , You are right.. in this case you can use the converters argument. You can get help in [here](https://stackoverflow.com/questions/18471859/pandas-read-csv-dtype-inference-issue) and in [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) – omri_saadon Jun 18 '17 at 14:51

Gonçalo Peres · Answer 3 · 2021-02-08T10:55:37.650

2

There are various ways one might do that, such as using: str.replace or str.strip.

Considering that one wants to update the column of the following DataFrame

And let's say that you want to remove the double quotes from the first column.

With str.replace one can do

df[0] = df[0].str.replace(r"[\"]", '')

Or

df[0] = df[0].str.replace('"', "")

This last one will also remove quotation marks if they appear along the element. If for example one has "236"76", it will turn into 23676.

With str.strip, to remove quotes from the ends of the strings, one can do

df[0] = df[0].str.strip('"')

Here is the final result

edited Feb 08 '21 at 10:55

answered Jul 22 '20 at 15:19

Gonçalo Peres

11,752
3
54
83

What if I have `"` in between the text i.e. in Column 1 lets say I have "236"76" – pythondumb Feb 08 '21 at 09:30
No this wont. Instead, I found this `df[0] = df[0].str.strip().str[1:-1]` useful. – pythondumb Feb 08 '21 at 10:18

score 0 · Answer 4 · answered Sep 13 '22 at 14:22

A slightly more generic solution that was useful in my case:

def remove_quotes(datum: object) -> object | str:
    if type(datum) is str:
        return datum.replace('"', '')
    else:
        return datum

# Define the column names.
names = ['j', 'x', 'y']

df = pd.read_csv(
    'test.csv',
    delimiter=';\s',
    engine='python',
    header=0,  # Ignore header.
    names=names, # Rename the columns at reading time.
    converters={name: remove_quotes for name in names},
)

Remove double quotes in Pandas

4 Answers4