How to preview a part of a large pandas DataFrame, in iPython notebook?

Question

I am just getting started with pandas in the IPython Notebook and encountering the following problem: When a DataFrame read from a CSV file is small, the IPython Notebook displays it in a nice table view. When the DataFrame is large, something like this is ouput:

In [27]:

evaluation = readCSV("evaluation_MO_without_VNS_quality.csv").filter(["solver", "instance", "runtime", "objective"])

In [37]:

evaluation

Out[37]:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 333 entries, 0 to 332
Data columns:
solver       333  non-null values
instance     333  non-null values
runtime      333  non-null values
objective    333  non-null values
dtypes: int64(1), object(3)

I would like to see a small portion of the data frame as a table just to make sure it is in the right format. What options do I have?

You could also [increase the max_rows](http://stackoverflow.com/questions/14449367/listing-elements-vs-dataframe-description-when-happens-when/14449538#14449538), to display the entire DataFrame. — Andy Hayden, Feb 21 '13 at 16:00
`evaluation.head()` will show the first 5 rows. You can pass it a number to see more or less. — Thomas K, Apr 03 '13 at 12:22
@ThomasK which library is evaluation.head() part of? I cannot find any info regarding this function online (Python noob here) — alwaysaskingquestions, Oct 14 '16 at 17:25
`head` is a method of pandas DataFrames. Docs here: http://pandas.pydata.org/pandas-docs/stable/10min.html#viewing-data — Thomas K, Oct 16 '16 at 08:56

score 72 · Answer 1 · edited Mar 18 '20 at 13:38

72

df.head(5) # will print out the first 5 rows
df.tail(5) # will print out the 5 last rows

edited Mar 18 '20 at 13:38

Nicolas Gervais

33,817
13
115
143

answered Sep 21 '14 at 13:09

tagoma

3,896
4
38
57

score 44 · Accepted Answer · edited Oct 07 '17 at 07:36

In this case, where the DataFrame is long but not too wide, you can simply slice it:

>>> df = pd.DataFrame({"A": range(1000), "B": range(1000)})
>>> df
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns:
A    1000  non-null values
B    1000  non-null values
dtypes: int64(2)
>>> df[:5]
   A  B
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4

ix is deprecated.

If it's both wide and long, I tend to use .ix:

>>> df = pd.DataFrame({i: range(1000) for i in range(100)})
>>> df.ix[:5, :10]
   0   1   2   3   4   5   6   7   8   9   10
0   0   0   0   0   0   0   0   0   0   0   0
1   1   1   1   1   1   1   1   1   1   1   1
2   2   2   2   2   2   2   2   2   2   2   2
3   3   3   3   3   3   3   3   3   3   3   3
4   4   4   4   4   4   4   4   4   4   4   4
5   5   5   5   5   5   5   5   5   5   5   5

Can also use iloc (rather than ix), which plays nicer with int/float labeled data. — Andy Hayden, Aug 31 '13 at 22:24

score 15 · Answer 3 · answered Apr 03 '13 at 11:49

I write a method to show the four corners of the data and monkey-patch to dataframe to do so:

def _sw(df, up_rows=10, down_rows=5, left_cols=4, right_cols=3, return_df=False):
    ''' display df data at four corners
        A,B (up_pt)
        C,D (down_pt)
        parameters : up_rows=10, down_rows=5, left_cols=4, right_cols=3
        usage:
            df = pd.DataFrame(np.random.randn(20,10), columns=list('ABCDEFGHIJKLMN')[0:10])
            df.sw(5,2,3,2)
            df1 = df.set_index(['A','B'], drop=True, inplace=False)
            df1.sw(5,2,3,2)
    '''
    #pd.set_printoptions(max_columns = 80, max_rows = 40)
    ncol, nrow = len(df.columns), len(df)

    # handle columns
    if ncol <= (left_cols + right_cols) :
        up_pt = df.ix[0:up_rows, :]         # screen width can contain all columns
        down_pt = df.ix[-down_rows:, :]
    else:                                   # screen width can not contain all columns
        pt_a = df.ix[0:up_rows,  0:left_cols]
        pt_b = df.ix[0:up_rows,  -right_cols:]
        pt_c = df[-down_rows:].ix[:,0:left_cols]
        pt_d = df[-down_rows:].ix[:,-right_cols:]

        up_pt   = pt_a.join(pt_b, how='inner')
        down_pt = pt_c.join(pt_d, how='inner')
        up_pt.insert(left_cols, '..', '..')
        down_pt.insert(left_cols, '..', '..')

    overlap_qty = len(up_pt) + len(down_pt) - len(df)
    down_pt = down_pt.drop(down_pt.index[range(overlap_qty)]) # remove overlap rows

    dt_str_list = down_pt.to_string().split('\n') # transfer down_pt to string list

    # Display up part data
    print up_pt

    start_row = (1 if df.index.names[0] is None else 2) # start from 1 if without index

    # Display omit line if screen height is not enought to display all rows
    if overlap_qty < 0:
        print "." * len(dt_str_list[start_row])

    # Display down part data row by row
    for line in dt_str_list[start_row:]:
        print line

    # Display foot note
    print "\n"
    print "Index :",df.index.names
    print "Column:",",".join(list(df.columns.values))
    print "row: %d    col: %d"%(len(df), len(df.columns))
    print "\n"

    return (df if return_df else None)
DataFrame.sw = _sw  #add a method to DataFrame class

Here is the sample:

>>> df = pd.DataFrame(np.random.randn(20,10), columns=list('ABCDEFGHIJKLMN')[0:10])

>>> df.sw()
         A       B       C       D  ..       H       I       J
0  -0.8166  0.0102  0.0215 -0.0307  .. -0.0820  1.2727  0.6395
1   1.0659 -1.0102 -1.3960  0.4700  ..  1.0999  1.1222 -1.2476
2   0.4347  1.5423  0.5710 -0.5439  ..  0.2491 -0.0725  2.0645
3  -1.5952 -1.4959  2.2697 -1.1004  .. -1.9614  0.6488 -0.6190
4  -1.4426 -0.8622  0.0942 -0.1977  .. -0.7802 -1.1774  1.9682
5   1.2526 -0.2694  0.4841 -0.7568  ..  0.2481  0.3608 -0.7342
6   0.2108  2.5181  1.3631  0.4375  .. -0.1266  1.0572  0.3654
7  -1.0617 -0.4743 -1.7399 -1.4123  .. -1.0398 -1.4703 -0.9466
8  -0.5682 -1.3323 -0.6992  1.7737  ..  0.6152  0.9269  2.1854
9   0.2361  0.4873 -1.1278 -0.2251  ..  1.4232  2.1212  2.9180
10  2.0034  0.5454 -2.6337  0.1556  ..  0.0016 -1.6128 -0.8093
..............................................................
15  1.4091  0.3540 -1.3498 -1.0490  ..  0.9328  0.3668  1.3948
16  0.4528 -0.3183  0.4308 -0.1818  ..  0.1295  1.2268  0.1365
17 -0.7093  1.3991  0.9501  2.1227  .. -1.5296  1.1908  0.0318
18  1.7101  0.5962  0.8948  1.5606  .. -0.6862  0.9558 -0.5514
19  1.0329 -1.2308 -0.6896 -0.5112  ..  0.2719  1.1478 -0.1459


Index : [None]
Column: A,B,C,D,E,F,G,H,I,J
row: 20    col: 10


>>> df.sw(4,2,3,4)
        A       B       C  ..       G       H       I       J
0 -0.8166  0.0102  0.0215  ..  0.3671 -0.0820  1.2727  0.6395
1  1.0659 -1.0102 -1.3960  ..  1.0984  1.0999  1.1222 -1.2476
2  0.4347  1.5423  0.5710  ..  1.6675  0.2491 -0.0725  2.0645
3 -1.5952 -1.4959  2.2697  ..  0.4856 -1.9614  0.6488 -0.6190
4 -1.4426 -0.8622  0.0942  .. -0.0947 -0.7802 -1.1774  1.9682
..............................................................
18  1.7101  0.5962  0.8948  .. -0.8592 -0.6862  0.9558 -0.5514
19  1.0329 -1.2308 -0.6896  .. -0.3954  0.2719  1.1478 -0.1459


Index : [None]
Column: A,B,C,D,E,F,G,H,I,J
row: 20    col: 10

I've made a few modifications here: https://gist.github.com/anonymous/8564133 (fixed width of index in dataframe print-out and shorten really long lists of index/column) — Noah, Jan 22 '14 at 18:16

score 8 · Answer 4 · edited Aug 17 '18 at 03:05

8

This line will allow you to see all rows (up to the number that you set as 'max_rows') without any rows being hidden by the dots ('.....') that normally appear between head and tail in the print output.

pd.options.display.max_rows = 500

edited Aug 17 '18 at 03:05

Ezra Citron

101
2
6

answered Aug 08 '18 at 15:51

John Khalaf

81
1
1

ajp619 · Answer 5 · 2015-05-21T19:37:27.757

Here's a quick way to preview a large table without having it run too wide:

Display function:

# display large dataframes in an html iframe
def ldf_display(df, lines=500):
    txt = ("<iframe " +
           "srcdoc='" + df.head(lines).to_html() + "' " +
           "width=1000 height=500>" +
           "</iframe>")

    return IPython.display.HTML(txt)

Now just run this in any cell:

ldf_display(large_dataframe)

This will convert the dataframe to html then display it in an iframe. The advantage is that you can control the output size and have easily accessible scroll bars.

Worked for my purposes, maybe it will help someone else.

score 7 · Answer 6 · edited Aug 17 '18 at 04:25

7

To see the first n rows of DataFrame:

df.head(n) # (n=5 by default)

To see the last n rows:

df.tail(n)

edited Aug 17 '18 at 04:25

Ezra Citron

101
2
6

answered May 15 '17 at 13:48

Bang Shen

101
1
3

score 6 · Answer 7 · answered Aug 23 '18 at 05:48

6

I found the following approach to be the most effective for sampling a DataFrame:

print(df[A:B]) ## 'A' and 'B' are the first and last records in range

For example, print(df[10:15]) will print rows 10 through 15 - inclusive - from your data set.

answered Aug 23 '18 at 05:48

Paul Sochacki

433
6
8

1

this is the right answer – Imene KOLLI Sep 27 '21 at 23:05

score 3 · Answer 8 · edited Oct 07 '16 at 08:27

3

You can just use nrows. For instance

pd.read_csv('data.csv',nrows=6)

will show the first 6 rows from data.csv.

edited Oct 07 '16 at 08:27

user

5,370
8
47
75

answered Oct 07 '16 at 08:20

Bang Shen

101
1
3

score 1 · Answer 9 · answered Jan 24 '14 at 04:20

Update one to generate string instead, and accommodate to Pandas0.13+

def _sw2(df, up_rows=5, down_rows=3, left_cols=4, right_cols=2, return_df=False):
    """ return df data display string at four corners
        A,B (up_pt)
        C,D (down_pt)
        parameters : up_rows=10, down_rows=5, left_cols=4, right_cols=3
        usage:
            df = pd.DataFrame(np.random.randn(20,10), columns=list('ABCDEFGHIJKLMN')[0:10])
            df.sw(5,2,3,2)
            df1 = df.set_index(['A','B'], drop=True, inplace=False)
            df1.sw(5,2,3,2)
    """

    #pd.set_printoptions(max_columns = 80, max_rows = 40)
    nrow, ncol = df.shape #ncol, nrow = len(df.columns), len(df)

    # handle columns
    if ncol <= (left_cols + right_cols) :
        up_pt = df.ix[0:up_rows, :]         # screen width can contain all columns
        down_pt = df.ix[-down_rows:, :]
    else:                                   # screen width can not contain all columns
        pt_a = df.ix[0:up_rows,  0:left_cols]
        pt_b = df.ix[0:up_rows,  -right_cols:]
        pt_c = df[-down_rows:].ix[:,0:left_cols]
        pt_d = df[-down_rows:].ix[:,-right_cols:]

        up_pt   = pt_a.join(pt_b, how='inner')
        down_pt = pt_c.join(pt_d, how='inner')
        up_pt.insert(left_cols, '..', '..')
        down_pt.insert(left_cols, '..', '..')

    overlap_qty = len(up_pt) + len(down_pt) - len(df)
    down_pt = down_pt.drop(down_pt.index[range(overlap_qty)]) # remove overlap rows

    dt_str_list = down_pt.to_string().split('\n') # transfer down_pt to string list

    # Display up part data
    ds = up_pt.__str__()
    #get rid of ending part of Pandas0.13+ display string by finding the last 3 '\n', ugly though
    Display_str = ds[0:ds[0:ds[0:ds.rfind('\n')].rfind('\n')].rfind('\n')] #refer to http://stackoverflow.com/questions/4664850/find-all-occurrences-of-a-substring-in-python

    start_row = (1 if df.index.names[0] is None else 2) # start from 1 if without index

    # Display omit line if screen height is not enought to display all rows
    if overlap_qty < 0:
        Display_str += "\n"
        Display_str += "." * len(dt_str_list[start_row])
        Display_str += "\n"

    # Display down part data row by row
    for line in dt_str_list[start_row:]:
        Display_str += "\n"
        Display_str += line

    # Display foot note
    Display_str += "\n\n"
    Display_str += "Index : %s\n"%str(df.index.names)

    col_name_list = list(df.columns.values)
    if ncol < 10:
        col_name_str = ", ".join(col_name_list)
    else:
        col_name_str = ", ".join(col_name_list[0:7]) + ' ... ' + ", ".join(col_name_list[-2:])
    Display_str = Display_str + "Column: " + col_name_str + "\n"
    Display_str = Display_str + "row: %d   col: %d"%(nrow, ncol) + "    "


    dty_dict={} #simulate defaultdict
    for k,g in itertools.groupby(list(df.dtypes.values)): #http://stackoverflow.com/questions/13565248/grouping-the-same-recurring-items-that-occur-in-a-row-from-list/13565414#13565414
        try:
            dty_dict[k] = dty_dict[k] + len(list(g))
        except:
            dty_dict[k] = len(list(g))

    for key in dty_dict:
        Display_str += "{0}: {1}   ".format(key, dty_dict[key])

    Display_str += "\n\n"

    return (df if return_df else Display_str)

score 1 · Answer 10 · answered Sep 02 '17 at 04:42

In order to view only first few entries you can use, pandas head function which is used as

dataframe.head(any number)        // default is 5
dataframe.head(n=value)

or you can also you slicing for this purpose, which can also give the same result,

dataframe[:n]

In order to view the last few entries you can use pandas tail() in a similar way,

dataframe.tail(any number)        // default is 5
dataframe.tail(n=value)

score 1 · Answer 11 · answered Nov 16 '17 at 04:07

In Python pandas provide head() and tail() to print head and tail data respectively.

import pandas as pd
train = pd.read_csv('file_name')
train.head() # it will print 5 head row data as default value is 5
train.head(n) # it will print n head row data
train.tail() #it will print 5 tail row data as default value is 5
train.tail(n) #it will print n tail row data

How to preview a part of a large pandas DataFrame, in iPython notebook?

11 Answers11

ix is deprecated.

Linked