Converting a dat file to excel with Pandas -- numbers formatting as strings

Question

I'm trying to convert .dat files into Excel (actually I would prefer to copy the contents into an excel file) but the numbers are pasting as text. I would like them to be floats.

I think one of the issues I'm having is that the .dat file is in a funky format.

import pandas as pd

df = pd.read_csv('C:/Users/me/Desktop/specimen/spec01/specimen.dat', sep='\t')
df.to_excel('C:/Users/me/Desktop/Rebar/specimen/specimen.xlsx')

Hi Abb, you haven't received responses because you do not have a good reproducible example. Please see https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples. If you update your Q, I'd be happy to help. In particular, having the input dat file is necessary. — Rich Andrews, Mar 28 '19 at 22:39
Thanks for the tip Rich, I've uploaded an example dat file. If it makes it easier I only need the headers and data -- nothing above it. My ultimate goal is to take a bunch of dat files and dump them as separate worksheets in a single excel file. — Abb, Mar 29 '19 at 02:50
ensure that there are no special character or spaces in your columns...this https://stackoverflow.com/questions/33257344/how-to-remove-special-characers-from-a-column-of-dataframe-using-module-re and http://www.datasciencemadesimple.com/strip-space-column-pandas-dataframe-leading-trailing-2/ could help — user2543622, Mar 29 '19 at 02:52

score 0 · Accepted Answer · answered Mar 29 '19 at 17:16

The referenced file has an overly ornate header which must be treated. Presuming you are receiving this file repeatedly, from a source that cannot be altered, such as a server or experiment apparatus, you are left to programmatically handle the following:

MTS793|MPT|ENU|1|2|.|/|:|1|0|0|A

Operator Information                        Time:   15.133789   Sec 8/2/2018 12:18:30 PM
Is the extensometer pin pulled? ughiopuh;u
Operator Information End

Data Acquisition            Tensile Test Data           Time:   117.81934   Sec 8/2/2018 12:20:13 PM
Time    Axial Displacement  Axial Strain    Axial Force
Sec in  in/in   lbf

This must be removed to a proper set of headers to successfully use pandas to import the file.

Time;Axial Displacement;Axial Strain;Axial Force
Sec in;in/in;lbf

are likely the headers desired. This gets you into column multiindex world which is getting advanced.

Here is a functioning answer:

import openpyxl
import pandas as pd
import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

csvdata = StringIO("""MTS793|MPT|ENU|1|2|.|/|:|1|0|0|A

Operator Information                        Time:   15.133789   Sec 8/2/2018 12:18:30 PM
Is the extensometer pin pulled? ughiopuh;u
Operator Information End

Data Acquisition            Tensile Test Data           Time:   117.81934   Sec 8/2/2018 12:20:13 PM
Time    Axial Displacement  Axial Strain    Axial Force
Sec in  in/in   lbf
15.372559   -0.00026854035  -0.00013428145  -3.7030973
15.472656   2.2532094e-08   -8.3925901e-05  11.109222
15.572754   -0.00026854035  -0.00011749626  3.7030623
15.672852   2.2532094e-08   -0.00011749626  -1.7583034e-05
15.772949   0.00026858543   -0.00010071108  7.4061422""")

df = pd.read_csv(csvdata, sep="\t", header=[5,6])

# Show how pandas parses the file headers
print(df.head(6))

# The columns are referenceable as a multiindex
level0 = df.columns.get_level_values(0)
level1 = df.columns.get_level_values(1)

# Set index from multiindex columns!
# But we are not going to assign this result, because 
# there is a better trick
print(df.set_index(('Time', 'Sec')))

flattened_cols = (["{} ({})".format(x,y) for x,y in zip(level0, level1)])
df.columns = flattened_cols

# The better trick
df.set_index(flattened_cols[0], inplace=True)

# show results
print(df.head(6))

# save in desired file format
df.to_excel('specimen.xlsx')

The result in memory

            Axial Displacement (in)  Axial Strain (in/in)  Axial Force (lbf)
Time (Sec)                                                                  
15.372559             -2.685404e-04             -0.000134          -3.703097
15.472656              2.253209e-08             -0.000084          11.109222
15.572754             -2.685404e-04             -0.000117           3.703062
15.672852              2.253209e-08             -0.000117          -0.000018
15.772949              2.685854e-04             -0.000101           7.406142

The result in xls

I deserve a straight up 'accepted answer' for this one! Best of luck.

You are very welcome, enjoy your work with Pandas. Also for the benefit of Stack Overflow's community, you may wish to rename your question, "Specify header rows and read_csv() data into column multiindex and save to_excel()" or something that captures the useful tricks shown. — Rich Andrews, Mar 29 '19 at 17:49

Converting a dat file to excel with Pandas -- numbers formatting as strings

1 Answers1