2

I am trying to find the best way to write formatted data to a .csv or even a .txt file. I am using Pandas to do this. However, the output is not the way I want it. For example, I want the data to be aligned with the header because I will be outputting over 30 different columns. Currently, the way the code is written, it outputs the data just fine but the data values are not aligned with the headers. Any help with this would be appreciated.

Here is some sample code I have written to test this out:

import pandas as pd
import numpy as np

data={'dpr_NS_corZFac': [np.nan, np.nan, 35.736231803894043, 36.331412792205811, 
               35.694644451141357, 36.576189994812012, 37.236752510070801, 
               38.173699378967285, 38.808069229125977, 36.761274337768555, 
               30.194313526153564],
    'dpr_HS_corZFac': [np.nan, 38.550984859466553, 37.893826961517334, 40.246520042419434, 
             39.204437732696533, 37.227160930633545, 37.364296913146973, 
             40.320019721984863, 39.04454231262207, 33.014707565307617, 
             27.193448543548584] }

# Create a Pandas dataframe from some data.
df = pd.DataFrame(data, columns=['dpr_NS_corZFac','dpr_HS_corZFac'])

df.to_csv('/home/cpabla/data/pandastext.txt', header=True,
          index=None, sep="\t", mode='w',na_rep='99.99', float_format='%.2f')

Output to python:

print df
    dpr_NS_corZFac  dpr_HS_corZFac
0              NaN             NaN
1              NaN       38.550985
2        35.736232       37.893827
3        36.331413       40.246520
4        35.694644       39.204438
5        36.576190       37.227161
6        37.236753       37.364297
7        38.173699       40.320020
8        38.808069       39.044542
9        36.761274       33.014708
10       30.194314       27.193449

Output to text file:

dpr_NS_corZFac  dpr_HS_corZFac
99.99   99.99
99.99   38.55
35.74   37.89
36.33   40.25
35.69   39.20
36.58   37.23
37.24   37.36
38.17   40.32
38.81   39.04
36.76   33.01
30.19   27.19

Essentially, I want the output to be exactly like the output to python.

miradulo
  • 28,857
  • 6
  • 80
  • 93
Charanjit Pabla
  • 373
  • 2
  • 5
  • 16
  • You're trying to write a fixed width csv – Zeugma Feb 14 '17 at 19:26
  • Tabs won't perfectly align the data, as you observe. You need to write a fixed-width file, which pandas does not support. to work around this, you need to convert all of your values to appropriately and uniformly sized strings. – Paul H Feb 14 '17 at 19:26
  • Ooooooo I would go with this solution: http://stackoverflow.com/a/35974742/5014455 – juanpa.arrivillaga Feb 14 '17 at 19:29

1 Answers1

2

If you want the formatted output of a DataFrame you get on the console, you could write to your txt with df.__repr__().

with open('/home/cpabla/data/pandastext.txt', 'w') as fi:
    fi.write(df.__repr__())

Giving a text file like

    dpr_NS_corZFac  dpr_HS_corZFac
0              NaN             NaN
1              NaN       38.550985
2        35.736232       37.893827
3        36.331413       40.246520
4        35.694644       39.204438
5        36.576190       37.227161
6        37.236753       37.364297
7        38.173699       40.320020
8        38.808069       39.044542
9        36.761274       33.014708
10       30.194314       27.193449

however this would involve some coercion of your DataFrame beforehand to meet your text file specification, and possibly your representation settings if your DataFrame is large enough.

miradulo
  • 28,857
  • 6
  • 80
  • 93
  • upvoting for the neat hack – Zeugma Feb 14 '17 at 19:32
  • 1
    @Boud Whenever I need a quick text file with Python-looking output, this hack is handy :) – miradulo Feb 14 '17 at 19:33
  • Hmm..so this works but now I have the missing values appearing as NaN. Is there a way to replace these with something like -99.99? – Charanjit Pabla Feb 14 '17 at 19:35
  • @CharanjitPabla As you may imagine, this is really not a solid answer but a hack - if you want to output with some format, you'll have to adjust the representation of your DataFrame beforehand. So replace `NaN` with `-99.99` in this case. – miradulo Feb 14 '17 at 19:36
  • @Mitch Thank you for your help. – Charanjit Pabla Feb 14 '17 at 19:52
  • @CharanjitPabla You are welcome! – miradulo Feb 14 '17 at 19:55
  • @Mitch A quick follow up Question: using the method you present here, is there a limit on columns printed in one line? For example, when I print out 7 different columns, after 5, the 6th and 7th column is printed below column 1 and 2. I would want this to span unlimited columns within the same line. Hopefully you understand what I am saying. – Charanjit Pabla Feb 14 '17 at 20:16
  • @CharanjitPabla this is exactly what I meant in my answer by "adjusting your representation settings". What you get with `__repr__()` is what you get in the console. You can adjust your console output in [Options and Settings](http://pandas.pydata.org/pandas-docs/stable/options.html) and consequently maybe get the text file output you want, but this is a slippery slope - if you need a scaleable solution for a large DataFrame, you probably want to use a more stable solution. – miradulo Feb 14 '17 at 20:18
  • @Mitch I added the following line 'pd.set_option('expand_frame_repr', False)' and this was a workaround. Not sure where it will fail again. Hopefully this works for what i am doing. – Charanjit Pabla Feb 14 '17 at 20:32
  • @CharanjitPabla Good luck! – miradulo Feb 14 '17 at 20:32