Error: bytes-like object is required, not 'str' on pd.read_csv

Question

I have following code to read the txt file as CSV

validationColumns='column1|column2'
headerColumns = validationColumns.split("|")
rawData = subprocess.check_output( 'sshpass -p \'pass\' ssh -o StrictHostKeyChecking=no user tail -n +1 /var/prod/archive/new.txt | awk \'NR==1, NR==35\'', shell=True).decode('utf-8')
df = pd.read_csv(io.BytesIO(rawData), encoding='utf8', sep='|', usecols=headerColumns, quotechar="~")

But I got the error like bytes-like object is required, not 'str' . Can anyone please help me with it.

score 1 · Accepted Answer · answered May 13 '21 at 15:12

1

rawData isn't a bytes object. check_output returns one, but you immediately decoded it into a str object:

rawData = subprocess.check_output(...).decode('utf-8')

Just omit that method call.

answered May 13 '21 at 15:12

chepner

497,756
71
530
681

tripleee · Answer 2 · 2021-05-13T15:29:17.663

The result you receive from check_output is already text; there is no reason and no way to separately decode it.

To debug code like this, you want to successively remove code until the problem goes away; probably review the guidance to provide a minimal reproducible example.

As an aside, you really want to avoid the shell=True and the pipe to Awk here. Python can perfectly well perform the same task, but if you run SSH anyway, you might as well refactor to use just Awk on the remote host.

rawData = subprocess.check_output(
    ['sshpass', '-p', 'pass',
     'ssh', '-o', 'StrictHostKeyChecking=no', 'user',
     'awk', 'NR==2, NR==36', '/var/prod/archive/new.txt'])

The same Awk script would be simpler still with sed; sed -n '2,36p'

Error: bytes-like object is required, not 'str' on pd.read_csv

2 Answers2