25

I love using the .head() and .tail() functions in pandas to circumstantially display a certain amount of rows (sometimes I want less, sometimes I want more!). But is there a way to do this with the columns of a DataFrame?

Yes, I know that I can change the display options, as in: pd.set_option('display.max_columns', 20)

But that is too clunky to keep having to change on-the-fly, and anyway, it would only replace the .head() functionality, but not the .tail() functionality.

I also know that this could be done using an accessor: yourDF.iloc[:,:20] to emulate .head(20) and yourDF.iloc[:,-20:] to emulate .tail(20).

It may look like a short amount of code, but honestly it's not as intuitive nor swift as when I use .head().

Does such a command exist? I couldn't find one!

MMelnicki
  • 662
  • 2
  • 8
  • 14

5 Answers5

26

No, such methods are not supplied by Pandas, but it is easy to make these methods yourself:

import pandas as pd
def front(self, n):
    return self.iloc[:, :n]

def back(self, n):
    return self.iloc[:, -n:]

pd.DataFrame.front = front
pd.DataFrame.back = back

df = pd.DataFrame(np.random.randint(10, size=(4,10)))

So that now all DataFrame would possess these methods:

In [272]: df.front(4)
Out[272]: 
   0  1  2  3
0  2  5  2  8
1  9  9  1  3
2  7  0  7  4
3  8  3  9  2

In [273]: df.back(3)
Out[273]: 
   7  8  9
0  3  2  7
1  9  9  4
2  5  7  1
3  3  2  5

In [274]: df.front(4).back(2)
Out[274]: 
   2  3
0  2  8
1  1  3
2  7  4
3  9  2

If you put the code in a utility module, say, utils_pandas.py, then you can activate it with an import statement:

import utils_pandas
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • nice! thanks, @unutbu! I'm VERY new to python & pandas (& programming altogether!), so I still haven't wrapped my head around UDF's. This will be a great first attempt. – MMelnicki Jun 03 '15 at 21:55
  • i tried putting a script with the function definitions in a folder listed in my sys.path, opened a new (IPython) console, did `import pandas as pd`, and then `import utils_pandas`, but I it says : NameError: name 'pd' is not defined Any idea why the module doesn't recognize my `pd` alias? – MMelnicki Jun 03 '15 at 21:57
  • Every module has its own "global" namespace. So putting `import pandas as pd` in one module (or script or in the console) does not define `pd` in another module. So be sure to put `import pandas as pd` in `utils_pandas.py`. – unutbu Jun 03 '15 at 23:36
3

Closest emulation, which you could put in a function:

number_of_columns = 5 # eg.
head_cols = df[df.columns[:number_of_columns]]
tail_cols = df[df.columns[-number_of_columns:]]
vk1011
  • 7,011
  • 6
  • 26
  • 42
2

Transpose it to use head and go back

df.T.head().T

to avoid index slicing or custom methods.

0

You could just use df.col.head(n) for what your are trying to do... see example below,

df = pd.DataFrame({'a': [i for i in range(101)],
                   'b': [i for i in range(101)]})
df.a.head(4)

Out[37]:
0    0
1    1
2    2
3    3
Name: a, dtype: int64
nitin
  • 7,234
  • 11
  • 39
  • 53
  • this is not what i am trying to do. the maxymoo & unutbu answers demonstrate (& solve!) the problem very elegantly! – MMelnicki Jun 03 '15 at 21:29
-1

You can just put a number inside brackets which will show n first/last number of rows in your dataframe.

df.head(10)

You can even put a lower number than the default (if you want too).

df.head(2)
mim
  • 1
  • 1