Pandas specify datatypes

Question

pandasframe_datatypes= ['A':int64, 'B':object, 'C':object, 'D':object, 'E':float64]

It is used like so:

test = pd.read_csv("test.csv", sep=";", names=pandasframe_names, dtype=pandasframe_datatypes)

But it gives a syntax error, what is wrong?

A is an integer, B and C and D is a string, E is a float.

What would the correct answer look like?

Also my csv has a header column and if I specify the names, the first column is doubled. Is there a solution for this as well?

`pandasframe_datatypes` is not valid python syntax – Sociopath Feb 24 '20 at 08:14 — Sociopath, Feb 24 '20 at 08:14

jezrael · Accepted Answer · 2020-02-24T11:40:05.520

3

Change not valid dictionary:

import numpy as np

pandasframe_datatypes= ['A':int64, 'B':object, 'C':object, 'D':object, 'E':float64]

to valid dict and also use valid dtypes for numeric columns:

pandasframe_datatypes= {'A':np.int64, 'B': object, 'C': object, 'D':object, 'E': np.float64}

Or:

pandasframe_datatypes= {'A':'int64', 'B': object, 'C': object, 'D':object, 'E': 'float64'}

edited Feb 24 '20 at 11:40

answered Feb 24 '20 at 08:16

jezrael

822,522
95
1,334
1,252

NameError: name 'np' is not defined. What can I do? – Smiley Feb 24 '20 at 11:39
1

@Smiley - add `import numpy as np` only. Or `pandasframe_datatypes= {'A':pd.np.int64, 'B': object, 'C': object, 'D':object, 'E': pd.np.float64}` – jezrael Feb 24 '20 at 11:41
This doesnt work. I get ValueError: invalid literal for int() with base 10: 'A' Could it be that my first column is double? how can I remove header (which contains some names) when I read the csv otherwise I get a datatype error and cannot cast to this datatype. – Smiley Feb 24 '20 at 12:29
1

@Smiley - It is expected, because bad data, cannot convert `A` to numeric, need [this](https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas) – jezrael Feb 24 '20 at 12:31
thanks a lot, I added skiprows=1 as a parameter and if I get something like ValueError: Integer column has NA values in column 6, what can I add? Could I insert null in this case? If so, how would I do that – Smiley Feb 24 '20 at 12:34
@Smiley possilble solution is use [this](https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html) – jezrael Feb 24 '20 at 12:51
using: pd.array(['NA', np.nan, None, pd.NA], dtype="Int64") TypeError: object cannot be converted to an IntegerDtype - how could I change the value on import instead to NULL ? It it would make sense if it first tries to convert to the correct type and only if it fails to replace by NULL. How could I do that? thanks so much. – Smiley Feb 24 '20 at 14:21
@Smiley - I think in `read_csv` really complicated, maybe some converter is necessary, dtype cannot be used here if bad date types – jezrael Feb 24 '20 at 14:22

Pandas specify datatypes

1 Answers1