How to read in pretty-printed dataframe into a Pandas dataframe?

Question

# necessary imports
from tabulate import tabulate
import pandas as pd

I have a dataframe:

df = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                   'B': ['B0', 'B1', 'B2', 'B3'],
                   'C': ['C0', 'C1', 'C2', 'C3'],
                   'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 1, 2, 3])

Using this, I pretty print it:

prettyprint=tabulate(df, headers='keys', tablefmt='psql')
print(prettyprint)

Result:

+----+-----+-----+-----+-----+
|    | A   | B   | C   | D   |
|----+-----+-----+-----+-----|
|  0 | A0  | B0  | C0  | D0  |
|  1 | A1  | B1  | C1  | D1  |
|  2 | A2  | B2  | C2  | D2  |
|  3 | A3  | B3  | C3  | D3  |
+----+-----+-----+-----+-----+

Saving it to a text file:

with open("PrettyPrintOutput.txt","w") as text_file:
    text_file.wite(prettyprint)

How can I read PrettyPrintOutput.txt back into a dataframe without doing a lot of text processing manually?

Maybe you can look into pickling it instead of writing to a text file? — user32882, Aug 23 '20 at 13:31
Yeah that's also good for general use. One of the main reasons I search for a solution in the above way is that I often posts on SO with dataframes given in a similar manner & find it hard to reproduce them. — zabop, Aug 23 '20 at 13:33
IMO, the pretty printed versions of DataFrames are a nuisance (Better to just plain `print` without the decorators, or use the `to_string()` method so someone can reproduce with StringIO). For pretty print, I wind up copying them, removing the lines and then find and replacing '|' with ''. Otherwise you end up with all kinds of whitespace issues on string columns/column headers. Sure you can strip it, but it winds up being more code — ALollz, Aug 23 '20 at 16:04

score 2 · Answer 1 · answered Aug 23 '20 at 15:25

One solution is to use clever keyword arguments in pd.read_csv / pd.read_clipboard:

    df = pd.read_csv(r'PrettyPrintOutput.txt', sep='|', comment='+', skiprows=[2], index_col=1)
    df = df[[col for col in df.columns if 'Unnamed' not in col]]

I just define all lines beginning with '+' as comments, so they don't get imported. This does not help against the third row, which has to be excluded using skiprow.

The second line is needed because you end up with additional columns using the '|' as separator. If you know the column names in advance use the keyword usecols to be explicit.

Output:

       A      B      C      D   
                                
0      A0     B0     C0     D0  
1      A1     B1     C1     D1  
2      A2     B2     C2     D2  
3      A3     B3     C3     D3

It also works with pd.read_clipboard, because the functions accept the same keyword arguments.

`object` columns and column headers are problematic with this approach. You'll need to strip all of them. — ALollz, Aug 23 '20 at 16:05

How to read in pretty-printed dataframe into a Pandas dataframe?

1 Answers1

Linked