1

EDIT: problem/issue/solution is explained here: https://github.com/wireservice/csvkit/issues/898 A fix involves setting the environment variable PYTHONIOENCODING


Today I learned I need to use the -e ENCODING option to deal my input data (it's ascii with some characters > 0x7f. "extended ascii") to csvcut, so for example

csvcut -v -e latin-1 -c AGE,NAME input.txt 

works as expected, however

cat input.txt | csvcut -v -e latin-1 -c AGE,NAME

does not, failing with an error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 22: invalid continuation byte

In other words, csvcut appears to be ignoring the -e option in the 2nd case.

the file input.txt is

NAME,AGE
Jimmie, 10
Ñandor, 357

Ñ is hex 0xd1

Why would the results be different thru the pipe? Is there a fix?

Ed Beighe
  • 11
  • 3
  • At the start of the question you state _"The workaround involves setting the environment variable PYTHONIOENCODING"_, but isn't that actually the _solution_ rather than just a _workaround_? Or does setting PYTHONIOENCODING simply not work when you pipe the input? – skomisa May 06 '23 at 22:42
  • ya, maybe a bad choice of words on my part. – Ed Beighe May 08 '23 at 04:46
  • OK, but to be clear: [1] Does setting PYTHONIOENCODING resolve the issue, or does its value get ignored? [2] If it does resolve the issue, is the point of your question simply to find a solution that does not involve setting PYTHONIOENCODING? – skomisa May 08 '23 at 05:04
  • [1] yes, as near as i can tell, it does solve the problem, and [2] well, I originally asked the question without any understanding that PYTHONIOENCODING was a thing or even that csvkit uses Python. To my way of thinking, the end-user (me in this case) shouldn't need to know such details – Ed Beighe May 10 '23 at 21:06
  • Understood and agreed. But for what it's worth, setting PYTHONIOENCODING is the only resolution that I can find unfortunately. – skomisa May 10 '23 at 21:10

0 Answers0