6

I want to store a pandas.DataFrame to a text file that has the columns aligned using whitespace characters. If this is my sample DataFrame:

In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(np.linspace(0,1,9).reshape(3,3))
In [4]: df
Out[4]: 
       0      1      2
0  0.000  0.125  0.250
1  0.375  0.500  0.625
2  0.750  0.875  1.000
[3 rows x 3 columns]

I want to do something like this:

In [5]: df.to_csv('test.txt', sep='?')

to get this:

In [6]: more test.txt
  0     1     2
0 0.0   0.125 0.25
1 0.375 0.5   0.625
2 0.75  0.875 1.0

What separator should I use? I want to know if there is a way to do this without using the \t character. It looks nice

    0       1       2
0   0.0     0.125   0.25
1   0.375   0.5     0.625
2   0.75    0.875   1.0

but then my text files have tab characters which create other problems.

If I use sep=' ' I get this which is obviously wrong.

 0 1 2
0 0.0 0.125 0.25
1 0.375 0.5 0.625
2 0.75 0.875 1.0

I know pandas can read in files like this so I figure there is a way to write out files like this.

Steven C. Howell
  • 16,902
  • 15
  • 72
  • 97

3 Answers3

4

How about this

import numpy as np
import pandas as pd
import csv

df = pd.DataFrame(np.linspace(0,1,9).reshape(3,3))
df.to_csv('test.txt', 
          float_format='%10.3f', sep=" ", 
          quoting=csv.QUOTE_NONE, escapechar=" ")

It produces:

 0 1 2
0           0.000           0.125           0.250
1           0.375           0.500           0.625
2           0.750           0.875           1.000

Number of spaces can be ofc customized by the number of digits of the 'longest' number.

grechut
  • 2,897
  • 1
  • 19
  • 18
  • 1
    This is good but I see two problems. It does not line up the column labels with the columns (perhaps this could be overcome by making the column labels floats). Also if different columns have different number of significant figures, it will be complicated to handle. – Steven C. Howell Mar 25 '15 at 15:41
0

Try using hollerith. pypi github. It's a (very) new python library that deals with fixed width formatting - and open to contribution. Unfortunately trying to use built-in python string formatting (or the C printf for that matter!) for fixed width doesn't work well for some widths when there are large integers and certain double precision floats.

0

One way is to use the tabulate package.

 % ipython
Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]
IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: 
import numpy as np
import pandas as pd
df = pd.DataFrame(np.linspace(0,1,9).reshape(3,3))
df
Out[1]: 
       0      1      2
0  0.000  0.125  0.250
1  0.375  0.500  0.625
2  0.750  0.875  1.000

In [2]: 
from tabulate import tabulate
content = tabulate(
  df.values.tolist(), list(df.columns), showindex=list(df.index), tablefmt="plain"
)

In [3]: 
print(content)
        0      1      2
 0  0      0.125  0.25
 1  0.375  0.5    0.625
 2  0.75   0.875  1

If you do not want to print the row index, do not pass the showindex option.

In [4]: 
content2 = tabulate(
  df.values.tolist(), list(df.columns), tablefmt="plain"
)
print(content2)
    0      1      2
0      0.125  0.25
0.375  0.5    0.625
0.75   0.875  1
Kamaraju Kusumanchi
  • 1,809
  • 19
  • 12