6

I have the following file:

"j"; "x"; y
"0"; "1"; 5
"1"; "2"; 6
"2"; "3"; 7
"3"; "4"; 8
"4"; "5"; 3
"5"; "5"; 4

Which I read by:

df = pd.read_csv('test.csv', delimiter='; ', engine='python')

Then I print print df and see:

   "j"  "x"  y
0  "0"  "1"  5
1  "1"  "2"  6
2  "2"  "3"  7
3  "3"  "4"  8
4  "4"  "5"  3
5  "5"  "5"  4

Instead, I would like to see:

   j  x  y
0  0  1  5
1  1  2  6
2  2  3  7
3  3  4  8
4  4  5  3
5  5  5  4

How to remove the double quotes?

Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
KcFnMi
  • 5,516
  • 10
  • 62
  • 136

4 Answers4

7

I did it with:

rm_quote = lambda x: x.replace('"', '')

df = pd.read_csv('test.csv', delimiter='; ', engine='python', 
     converters={'\"j\"': rm_quote, 
                 '\"x\"': rm_quote})

df = df.rename(columns=rm_quote)
KcFnMi
  • 5,516
  • 10
  • 62
  • 136
2

You can pass the type as an argument to the read_csv function.

pd.read_csv('test.csv', delimiter='; ', engine='python', dtype=np.float32)

You can read more in read_csv

Also, you can use to_numeric function.

df = df.apply(pd.to_numeric)
omri_saadon
  • 10,193
  • 7
  • 33
  • 58
  • `convert_objects` is deprecated – Ted Petrou Jun 18 '17 at 14:16
  • @TedPetrou , Thanks, I've updated it to use the `to_numeric` method – omri_saadon Jun 18 '17 at 14:19
  • Why would you use `apply` here instead of just using the function itself. `pd.to_numeric(df, errors='ignore')`. Also, to_numeric is a function not a method. – Ted Petrou Jun 18 '17 at 14:20
  • I got `ValueError: The 'dtype' option is not supported with the 'python' engine`. and `ValueError: ('Unable to parse string', u'occurred at index "j"')`, for the alterntive aproach. – KcFnMi Jun 18 '17 at 14:44
  • 1
    @KcFnMi , You are right.. in this case you can use the converters argument. You can get help in [here](https://stackoverflow.com/questions/18471859/pandas-read-csv-dtype-inference-issue) and in [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) – omri_saadon Jun 18 '17 at 14:51
2

There are various ways one might do that, such as using: str.replace or str.strip.

Considering that one wants to update the column of the following DataFrame

Example of DataFrame

And let's say that you want to remove the double quotes from the first column.

With str.replace one can do

df[0] = df[0].str.replace(r"[\"]", '')

Or

df[0] = df[0].str.replace('"', "")

This last one will also remove quotation marks if they appear along the element. If for example one has "236"76", it will turn into 23676.

With str.strip, to remove quotes from the ends of the strings, one can do

df[0] = df[0].str.strip('"')

Here is the final result

Output after running the code above

Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
0

A slightly more generic solution that was useful in my case:

def remove_quotes(datum: object) -> object | str:
    if type(datum) is str:
        return datum.replace('"', '')
    else:
        return datum

# Define the column names.
names = ['j', 'x', 'y']

df = pd.read_csv(
    'test.csv',
    delimiter=';\s',
    engine='python',
    header=0,  # Ignore header.
    names=names, # Rename the columns at reading time.
    converters={name: remove_quotes for name in names},
)
Victor
  • 3,081
  • 2
  • 18
  • 20