1

I am new in python. I have SAS data imported using pandas. The data is coming in below format.

CLASCODE    CLASDESC 
b'CT'       b'CTS-item' 
b'RI'       b'Request for information' 

I want to remove b and '' from the data through pandas or numpy. Please help.

Bill DeRose
  • 2,330
  • 3
  • 25
  • 36
  • 1
    These are just the representation of raw byte strings, there is no "b" character in those strings. – juanpa.arrivillaga Dec 05 '18 at 03:50
  • [`pandas.read_sas`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sas.html) might be what you're looking for. A related post may be helpful: https://stackoverflow.com/questions/38930583/how-to-get-text-from-btext-in-the-pandas-object-type-after-using-read-sas. – Bill DeRose Dec 05 '18 at 04:01
  • 1
    thanks Bill, It is working fine now using encoding="utf-8" – Swapnil Dagar Dec 05 '18 at 04:44
  • Bill, I have another file to load. I did the same process but it is giving me below error. – Swapnil Dagar Dec 05 '18 at 05:45
  • UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 0: invalid start byte i used this command to read the file d=pd.read_sas('nameaddr.sas7bdat',encoding="utf-8") – Swapnil Dagar Dec 05 '18 at 05:47

1 Answers1

0

Look for proper encoding while you do parsing. However, if you have to go with current way then here's how you can remove ' and b.

# Assuming data stored in series
s = pd.Series("CLASCODE CLASDESC b'CT' b'CTS-item' b'RI' b'Request for information'")
s = s.str.replace("'","")
s = s.str.replace("b","")
# CLASCODE CLASDESC CT CTS-item RI Request for information
meW
  • 3,832
  • 7
  • 27