62

In questions and answers, users very often post an example DataFrame which their question/answer works with:

In []: x
Out[]: 
   bar  foo
0    4    1
1    5    2
2    6    3

It'd be really useful to be able to get this DataFrame into my Python interpreter so I can start debugging the question, or testing the answer.

How can I do this?

cs95
  • 379,657
  • 97
  • 704
  • 746
LondonRob
  • 73,083
  • 37
  • 144
  • 201

3 Answers3

72

Pandas is written by people that really know what people want to do.

Since version 0.13 there's a function pd.read_clipboard which is absurdly effective at making this "just work".

Copy and paste the part of the code in the question that starts bar foo, (i.e. the DataFrame) and do this in a Python interpreter:

In [53]: import pandas as pd
In [54]: df = pd.read_clipboard()

In [55]: df
Out[55]: 
   bar  foo
0    4    1
1    5    2
2    6    3

Caveats

  • Don't include the iPython In or Out stuff or it won't work
  • If you have a named index, you currently need to add engine='python' (see this issue on GitHub). The 'c' engine is currently broken when the index is named.
  • It's not brilliant at MultiIndexes:

Try this:

                      0         1         2
level1 level2                              
foo    a       0.518444  0.239354  0.364764
       b       0.377863  0.912586  0.760612
bar    a       0.086825  0.118280  0.592211

which doesn't work at all, or this:

              0         1         2
foo a  0.859630  0.399901  0.052504
    b  0.231838  0.863228  0.017451
bar a  0.422231  0.307960  0.801993

Which works, but returns something totally incorrect!

LondonRob
  • 73,083
  • 37
  • 144
  • 201
  • Note that the dtype will be `object` i.e. `str` so you may need to do `df = df.convert_objects(convert_numeric=True)` – EdChum Jul 24 '15 at 12:53
  • That doesn't seem to be the case with this particular example (the one in the OP). It's a good one to watch out for though. – LondonRob Jul 24 '15 at 12:55
  • Hmm, I've definitely observed that before but you're correct this does work correctly – EdChum Jul 24 '15 at 12:58
  • I am wondering how you can copy and paste the input and output number in your code like you are showing here with `In [55]` and `Out[55]`. I'm using Jupyter Notebook and can't seem to find a way to do that. – Bowen Liu Mar 25 '21 at 15:53
20

pd.read_clipboard() is nifty. However, if you're writing code in a script or a notebook (and you want your code to work in the future) it's not a great fit. Here's an alternative way to copy/paste the output of a dataframe into a new dataframe object that ensures that df will outlive the contents of your clipboard:

# py3 only, see below for py2
import pandas as pd
from io import StringIO

d = '''0   1   2   3   4
A   Y   N   N   Y
B   N   Y   N   N
C   N   N   N   N
D   Y   Y   N   Y
E   N   Y   Y   Y
F   Y   Y   N   Y
G   Y   N   N   Y'''

df = pd.read_csv(StringIO(d), sep='\s+')

A few notes:

  • The triple-quoted string preserves the newlines in the output.
  • StringIO wraps the output in a file-like object, which read_csv requires.
  • Setting sep to \s+ makes it so that each contiguous block of whitespace is treated as a single delimiter.

update

The above answer is Python 3 only. If you're stuck in Python 2, replace the import line:

from io import StringIO

with instead:

from StringIO import StringIO

If you have an old version of pandas (v0.24 or older) there's an easy way to write a Py2/Py3 compatible version of the above code:

import pandas as pd

d = ...
df = pd.read_csv(pd.compat.StringIO(d), sep='\s+')

The newest versions of pandas have dropped the compat module along with Python 2 support.

tel
  • 13,005
  • 2
  • 44
  • 62
  • and better is use `pd.compat.StringIO`, then is no necessary `from io import StringIO` – jezrael Dec 09 '18 at 13:13
  • 1
    In Python 3 it makes absolutely no difference (`pd.compat.StringIO is io.StringIO` is `True`), and is just a matter of style (import statement vs longer argument). However, I just checked, and it turns out that using `io.StringIO` in Python 2 leads to a nasty `bytes` vs `unicode` issue. I'd say that's definitely a good enough reason to prefer one over the other, so I'll change my answer to use `pd.compat.StringIO`. – tel Dec 09 '18 at 13:32
  • You are missing import: according to your Python version, i.e. `from StringIO import StringIO` for Python 2 and `from io import StringIO` for Python 3 as stated in https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html. With pandas 0.25.0 import will fail because of the https://github.com/pydata/pandas-datareader/issues/655. Solution is to rollback to previous pandas version with: `pip3 install --upgrade pandas==0.24.2` – Hrvoje Nov 26 '19 at 07:11
  • 1
    @Harvey Apparently, `pandas >= 0.25.0` has dropped the `compat` module, along with Python 2 support. I've updated my answer to reflect this – tel Dec 07 '19 at 00:52
1

If you are copy-pasting from CSV file which has standard entries like this:

2016,10,M,0600,0610,13,1020,24
2016,3,F,0300,0330,21,6312,1
2015,4,M,0800,0830,8,7112,30
2015,10,M,0800,0810,19,0125,1
2016,8,M,1500,1510,21,0910,2
2015,10,F,0800,0810,3,8413,5

df =pd.read_clipboard(sep=",", header=None)
df.rename(columns={0: "Name0", 1: "Name1",2:"Name2",3:"Name3",4:"Name4",5:"Name5",6:"Name6",7:"Name7",8:"Name8"})

will give you properly defined pandas Dataframe.

enter image description here

Hrvoje
  • 13,566
  • 7
  • 90
  • 104