This is quite the opposite of Get pandas.read_csv to read empty values as empty string instead of nan
Given the following CSV file:
col,val
"hi
there",1
,2
\f\,3
"",4
"""hi""",5
I want it to be read as:
col val
0 hi\nthere 1
1 NaN 2
2 \f\ 3
3 4
4 "hi" 5
That is, reading the empty field (val 2) as NaN
, while keeping the empty string (val 4) as empty string.
Currently pd.read_csv
converts val 2 and val 4 both as NaN
, or if I use na_filter=False
both are kept as empty string.
I'm assuming these two representations mean different things in CSV (empty fields vs empty string), so I'm assuming pandas should be able to distinguish this too.
Is there a way to make pandas to distinguish these two cases? Or is my assumption wrong, that the two representations are actually the same? (please point me to a CSV standard if the second one is the case)
More information, I got the CSV by exporting BigQuery table (with the intended meaning, val 2 is null and val 4 is empty string) into CSV. And I want to get the exact same table back. So this example is not just a contrived example, but is actually used by BigQuery when exporting to CSV.
EDIT: further search reveals a Github issue 4 years ago that discusses similar point (see this comment, for example), and one of the commenter mentions that there is some coercion (I'm not sure what they refer to, but I understand it as coercion between empty field and empty string). Is this still happening?