Pandas - pandas.DataFrame.from_csv vs pandas.read_csv

Question

What's the difference between:

pandas.DataFrame.from_csv, doc link: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html

and

pandas.read_csv, doc link: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html

I believe one of the most important differences is the default for index column. Using `from_csv` will default to use the first column as an index. `read_csv` defaults to `None` so an index will be created. `read_csv` should be best suited for most datasets. — Ryan G, Oct 21 '14 at 20:14
Didn't even know `from_csv` existed, personally having looked at both I would use `read_csv` as it has far more options that should assist with data mangling — EdChum, Oct 21 '14 at 20:20
I think `pandas.DataFrame.from_csv` has now been removed. I'm unable to call it and your link gives a 404: Not Found error. — Matthias Fripp, Oct 09 '19 at 20:27

joris · Accepted Answer · 2014-10-21T20:43:18.843

33

There is no real difference (both are based on the same underlying function), but as noted in the comments, they have some different default values (index_col is 0 or None, parse_dates is True or False for read_csv and DataFrame.from_csv respectively) and read_csv supports more arguments (in from_csv they are just not passed through).

Apart from that, it is recommended to use pd.read_csv.
DataFrame.from_csv exists merely for historical reasons and to keep backwards compatibility (plans are to deprecate it, see here), but all new features are only added to read_csv (as you can see in the much longer list of keyword arguments). Actually, this should be made more clear in the docs.

edited Oct 21 '14 at 20:43

answered Oct 21 '14 at 20:35

joris

133,120
36
247
202

thanks joris. I would be interested in getting involved with the pandas project, unless you can think of a better library out there for data analysis or machine learning... – user3659451 Oct 21 '14 at 22:15
@joris: `index_col` defaults are the opposite -- 'None or 0' (I edited) – jjrr Sep 13 '19 at 13:48

score 3 · Answer 2 · answered Apr 07 '17 at 05:13

3

Another difference is that pandas.read_csv is 46x to 490x as fast as pandas.DataFrame.from_csv (in my testing).

I tested it on Python 3.4.4 and pandas 0.19.2 on Windows on my proprietary csv file.

answered Apr 07 '17 at 05:13

ChaimG

7,024
4
38
46

Pandas - pandas.DataFrame.from_csv vs pandas.read_csv

2 Answers2

Linked