3

I read rpt data to pandas by using:

import pandas as pd
df = pd.read_fwf("2014-1.rpt", skiprows=[1], nrows=150)

I actually follow the anwser here However, for some columns, seperation is not accurate. It is sample of what I get:

Country   Order Date Device   Category
UK        2014-01-03 Desktop  Shoes
IT        2014-01-03 Desktop  Shoes
FR        2014-01-04 Desktop  Dress
FR        2014-01-04 Tablet   Dress
US        2014-01-05 Desktop  Bags
US        2014-01-06 Desktop  Bags
UK        2014-01-07 Tablet   Dress

For instance it reads Order Date and Device columns as a single column. Actually, it is just an example, there are many columns like this. How to solve it? Do you have any idea? Actually these columns with problems might have fixed widths

edyvedy13
  • 2,156
  • 4
  • 17
  • 39
  • 1
    Have you checked the `read_fwf` documentation, especially the colspecs and delimiters parameters? You may need to enter the `colspecs` manually, if python is not picking up on them correctly. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_fwf.html – wkzhu Dec 06 '17 at 23:41
  • I tried to read it with widths I should use column names for colspecs right ? – edyvedy13 Dec 06 '17 at 23:43
  • colspecs should be something like [(0,6), (8,15), ...] dependent on the lengths of your columns, one pair of integers for each column. – wkzhu Dec 06 '17 at 23:47
  • oh ok let me try – edyvedy13 Dec 06 '17 at 23:47
  • So I have to do it for 128 columns, there is no easy way right ? – edyvedy13 Dec 06 '17 at 23:49
  • Maybe see if you can convert to CSV (see https://stackoverflow.com/questions/10538162/convert-ssms-rpt-output-file-to-txt-csv) and then use `pd.read_csv`... – wkzhu Dec 07 '17 at 00:09

2 Answers2

4

This question is old, but here is an answer. You can read it as a csv using pandas. I have used this for a variety of rpt files and it has worked.

import pandas as pd
df = pd.read_csv("2014-1.rpt", skiprows=[1], nrows=150)
GK89
  • 646
  • 5
  • 29
0

To read a SQL Server rpt data export file this might work:

import pandas as pd

myfile = 'myfile.rpt'  # <--- edit this

# get column specification using the second line with all the dashes
count = 1
for x in open(myfile, encoding='utf8'):
    cols = x.rstrip()  # remove newline character
    count += 1
    if count > 2:
        break

# build colspecs list
colspecs = []
idx = 0
for c in cols.split(' '):
    n = len(c)
    colspecs.append((idx, idx + n))
    idx += 1 + n

df = pd.read_fwf(myfile, colspecs=colspecs, encoding='utf8', skiprows=[1])
e1i45
  • 1,519
  • 1
  • 14
  • 20