23

Lately I'm constantly finding myself asking questions in Pandas which depend on data that I'm using , so far it takes me quite a while to create a data frame with similarity to my data (reproducible data frame) so that SO users could easily copy it to their machine.

I would prefer to find a convenient way so i could just print my small DF within my question, and other users could easily collect it, hence creating it with minimum effort.

In R I'm used to print a small sample of my data within the dput function in the console, and then printing the output within my question (example): Getting the error "level sets of factors are different" when running a for loop

I've noticed this explanation, but i don't think its suitable for printing a sample of data for other SO users: Python's equivalent for R's dput() function

Is there an equivalent method in Pandas for doing that?

Thanks in advance!

Yehoshaphat Schellekens
  • 2,305
  • 2
  • 22
  • 49

1 Answers1

25

If binary data is OK for you, you can use the pickle library. It usually allows to serialize and deserialize arbitraty objects (on condition that their class definition is provided, which is true for dataframes, if pandas is installed).

If you need a human-readable format, you can create a Python dictionary from your dataframe with df_dict = df.to_dict(), and print this dictionary (to look at it and maybe copy-paste), or dump it to a JSON string.

When you want to convert a dict back to pandas, use df = pd.DataFrame.from_dict(df_dict).

A minimal example of decoding and encoding:

import pandas as pd
df = pd.DataFrame.from_dict({'a': {0: 1, 1: 2}, 'b': {0: 3, 1: 3}})
print(df.to_dict())

which results in the {'a': {0: 1, 1: 2}, 'b': {0: 3, 1: 3}} copy-able object.

David Dale
  • 10,958
  • 44
  • 73
  • 2
    Thanks, this is perfect! – Yehoshaphat Schellekens Nov 23 '17 at 08:52
  • 2
    This was helpful thanks for sharing. But I'm discovering that there's a limit to the size of the dictionary. I'm using mtcars which is only 32x11 and I had a cut off a couple columns to make it work. Does that sound right to you? – hachiko Apr 29 '22 at 17:25