I would like pandas read_csv to properly read the following example text into a DataFrame:
"INDEX"|"COLUMN_STRING"|"COLUMN_INTEGER"|"COLUMN_EMPTY"|"COLUMN_EMPTY_STRING"
1|"string"|21||""
The file I need to parse has all the values that should be strings wraped with ""
.
Values that should be NaN
are without double quotes, like that: ||
I would like read_csv to keep all the "quoted" values as strings, also ""
, but
it forces NaN
as a default value for ""
.
If I use keep_default_na=False
, it sets empty strings ''
to both ||
and |""|
.
Also, using dtype={"COLUMN_EMPTY_STRING": str}
doesn't help.
Does anybody know the solution to this pickle?
Another possible solution, would be to use quoting=3
. This would keep strings as "string"
, which could be solved after parsing. I cannot use it though, since I'm providing index_col
argument, which raises error since it cannot find e.g. INDEX
, because it reads "INDEX"
from the file.