R's read.table equivalent in Python

Question

I'm trying to move some of my processing work from R to Python. In R, I use read.table() to read REALLY messy CSV files and it automagically splits the records in the correct format. E.g.

391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>

<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>
","windows-7 printer hp"

is correctly separated into 4 columns. 1 record can be split over many lines and there are commas all over the place. In R I just do:

read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)

Is there something in Python that can do this equally well?

Thanks!

See also: https://stackoverflow.com/questions/22604564/create-pandas-dataframe-from-a-string — PatrickT, Jan 04 '22 at 23:42

score 4 · Accepted Answer · answered Oct 23 '13 at 08:59

You can use csv module.

from csv import reader
csv_reader = reader(open("C:/text.txt","r"), quotechar="\"")

for row in csv_reader:
    print row

['391788', 'HP Deskjet 3050 scanner always seems to break', "<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>\n\n<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>\n", 'windows-7 printer hp']

length of output = 4

But this just returns strings. It doesn't infer the type of each column the way that read.table does. — c-urchin, Jun 30 '14 at 16:07

David · Answer 2 · 2013-10-23T15:33:52.947

The pandas module also offers many R-like functions and data structures, including read_csv. The advantage here is that the data will be read in as a pandas DataFrame, which is a little easier to manaipulate than a standard python list or dict (especially if you're accustomed to R). Here is an example:

>>> from pandas import read_csv
>>> ugly = read_csv("ugly.csv",header=None)
>>> ugly
        0                                              1  \
0  391788  HP Deskjet 3050 scanner always seems to break   

                                                   2                     3  
0  <p>I'm running a Windows 7 64 blah blah blah.....  windows-7 printer hp

R's read.table equivalent in Python

2 Answers2