Why is Pandas DataFrame.to_csv misses rows when writing from a normalized Json using open()?

Question

Already found a solution for my original problem (by coding according to convention). However, I don't understand why I get two different csv files when I use open() in two different ways.

Using Pandas on Python 3.7, I normalized a Json list using pandas.io.json.json_normalize. The new DataFrame object has the desired dimensions (25 by 28).

>>>normalFrame = pd.io.json.json_normalize(jsonList)
>>> normalFrame.shape
(25, 28)                    #Awesome. So far, so good.
>>>filename = 'pandaNormJson.csv'

Now, the following attempts created two different files:

Attempt 1 (fail):

>>>newCsv = open(filename, 'w', newline='')
>>>normalFrame.to_csv(newCsv)

This creates a csv file with 18 out of 25 records.

Attempt 2 -proper way (success)

>>> with open(filename, 'w', newline='') as newCsv:
...     normalFrame.to_csv(newCsv)

Now I have the desired 25x28 (without headers) csv file.

Whats wrong with attempt 1?
Whats the difference between with open() as writer and writer = open()?
Is it something behind the scenes of open() or pd.to_csv()?

Why are you opening the file yourself at all? the `.to_csv` does that all for you... the **proper** "proper way" is to just do: `normalFrame.to_csv(filename)` and let it do the needful. — Jon Clements, May 07 '19 at 17:02
I bet if you did `newCsv.close()` at the end of your first failed attempt, you'd probably be okay... you might still have unflushed data depending how you're running this. While, your second example, the `with` statement automatically closes the file outside the scope. However, inside that, the `normalFrame.to_csv(...)` call has probably done the same thing and closed the file, and the `with` block it's in, when it attempts to close it again anyway - doesn't care as it doesn't need to do anything. — Jon Clements, May 07 '19 at 17:06
Possible duplicate of [What is the python "with" statement designed for?](https://stackoverflow.com/questions/3012488/what-is-the-python-with-statement-designed-for) — G. Anderson, May 07 '19 at 17:07
Ahh... just remembered calling `.close()` on an already closed file is a no-op... so yeah... that's what's happening here... (I think) — Jon Clements, May 07 '19 at 17:08
@JonClements newCsv.close() does the trick! Funny thing is, deleting the file and then closing it dumps records 13-25. 13-18 were already in the first dumping attempt, so there's a repeat. Skipping open() is great! Pandas just keep surprising me. — DataGarden, May 07 '19 at 20:33
@JonClements [.to_csv() docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) confused me because they say "If a file object is passed it should be opened with newline=’‘. I totally ignored the simpler option! Thanks for the tip! — DataGarden, May 07 '19 at 20:45

Why is Pandas DataFrame.to_csv misses rows when writing from a normalized Json using open()?

0 Answers0