0

I have a CSV file with lines as follows:

"Dec 30, 2021","1,234.11","1,654.22","11,876.23","1,676,234"

I have learn from a previous post that I can use:

parse_dates=['Date']

To get the date parsed (that works). However I would like columns 2-4 as np.float64 and column 5 as int64. How can I achieve that?

I have tried this:

data = pd.read_csv("file.csv",  parse_dates=['Date'], dtype=[np.datetime64, np.float64, np.float64, np.float64, np.float64, np.int64])

but I get

TypeError: data type not understood
M.E.
  • 4,955
  • 4
  • 49
  • 128

2 Answers2

3

Use thousands parameter.

df = pd.read_csv("file.csv",  parse_dates=['Date'], thousands=',')
Emma
  • 8,518
  • 1
  • 18
  • 35
0

Use converters parameter if you have special format.

converters = {
    'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
    'Number': lambda x: float(x.replace(',', ''))
}
df = pd.read_csv('data.csv', converters=converters)

Output:

>>> df
        Date   Number
0 2021-12-30  2345.55

>>> df.dtypes
Date      datetime64[ns]
Number           float64
dtype: object

# data.csv
Date,Number
"Dec 30, 2021","2,345.55"

Else use standard parameters:

df = pd.read_csv("data.csv",  header=None, parse_dates=[0], thousands=',', quoting=1)

Output:

>>> df
           0        1        2         3        4
0 2021-12-30  1234.11  1654.22  11876.23  1676234

>>> df.dtypes
0    datetime64[ns]
1           float64
2           float64
3           float64
4             int64
dtype: object
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • thanks, this is very comprehensive. I chosed `thousands=` answer as it seems more idiomatic/simple in pandas. – M.E. Jan 03 '22 at 16:52