2

I'm coming to Python from a SAS background.

I've imported a SAS version 5 transport file (XPT) into python using:

df = pd.read_sas(r'C:\mypath\myxpt.xpt')

The file is a simple SAS transport file, converted from a SAS dataset created with the following:

DATA myxpt;
  DO i = 1 TO 10;
    y = "XXX";
    OUTPUT;
  END;
RUN;

The file imports correctly and I can view the contents using:

print(df)

screenshot showing print of dataframe

However, when I view the file using the variable explorer, all character columns are shown as blank.

Screenshot showing data frame viewed through Variable explorer

I've tried reading this as a sas dataset instead of a transport file and importing this into Python but have the same problem.

I've also tried creating a dataframe within python containing character columns and this displays correctly within the variable explorer.

Any suggestions what's going wrong?

Thanks in advance.

Easynow
  • 191
  • 1
  • 3
  • 14
  • Column Y is a column of binary strings. I believe you have to decode it first. The variable explorer cannot guess the correct encoding and apprently does not show binary strings. If you do not know the encoding you will have to guess. Try `df['utf8']=df.Y.str.decode('utf8')` and see if the info in the variable explorer makes any sense. This https://stackoverflow.com/questions/17615414/how-to-convert-binary-string-to-normal-string-in-python3 might help. – Frâncio Rodrigues Nov 19 '18 at 14:12
  • 1
    That works perfectly thanks for the quick response. Building on your answer I see that I can specify the encoding ='utf8' on the import of the file which also resolves the issue. – Easynow Nov 19 '18 at 14:25
  • Great! I will write a complete answer with what you have done too. – Frâncio Rodrigues Nov 19 '18 at 15:14

1 Answers1

4

Column Y is a column of binary strings. You have to decode it first. The variable explorer cannot guess the correct encoding and apparently does not show binary strings. If you do not know the encoding you will have to guess. Try df['utf8']=df.Y.str.decode('utf8') and see if the info makes any sense.

As you have noted, it is possible to specify the encoding in the import function:

df = pd.read_sas(r'C:\mypath\myxpt.xpt', encoding='utf8')

As a sidenote, you should always be aware and preferably explicit of the encodings in use to avoid major headaches.

For a list of all available encodings and ther aliases check here.

Frâncio Rodrigues
  • 2,190
  • 3
  • 21
  • 30