0

This is a silly example of the problem, I have a bash variable I am building inside the string to run with python -c "$string", but there are some characters as which are breaking my program.

Running this:

#!/bin/bash
set -x
var='some  thing'
file_name="$(python3 -c '#!/usr/bin/env python3
print(r"'"$var"'")')";

# Outputs
$ bash test.sh
+ var='some  thing'
++ python3 -c '#!/usr/bin/env python3
print(r"some  thing")'
Unable to decode the command from the command line:
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 36-43: surrogates not allowed
+ file_name=

How I can run this bash string as a Python script without having it to throw the UnicodeEncodeError?

Relacing print(r"some thing") by somvar = r"some thing") or # -*- coding: utf-16be -*- does not change the error:

$ bash test.sh
+ var='some  thing'
++ python3 -c '#!/usr/bin/env python3
# -*- coding: utf-16be -*-
somevar = r"some  thing"'
Unable to decode the command from the command line:
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 67-74: surrogates not allowed
+ file_name=

I use:

$ python3 --version
Python 3.9.6

bash --version
GNU bash, version 5.1.8(1)-release (x86_64-pc-msys)

Update

Writing the contents to a file before running fixes the problem:

#!/bin/bash
set -x
var='some  thing'
printf '%s' '#!/usr/bin/env python3
print(r"'"$var"'")' > /tmp/t.py
file_name="$(python3 /tmp/t.py)";

# Outputs
$ bash test.sh
+ var='some  thing'
+ printf %s '#!/usr/bin/env python3
print(r"some  thing")'
++ python3 /tmp/t.py
+ file_name='some  thing'

Also, piping the string into python fixes the problem:

#!/bin/bash
set -x
var='some  thing'
file_name="$(printf '%s' '#!/usr/bin/env python3
print(r"'"$var"'")' | python3)";

# Outputs
$ bash test.sh
+ var='some  thing'
++ printf %s '#!/usr/bin/env python3
print(r"some  thing")'
++ python3
+ file_name='some  thing'

Now my question would be, can I run the code without creating a file or piping things, i.e., running it directly from bash as a string with python -c?

Update 2

The output of running echo python3 -c ... as asked on comments:

#!/bin/bash
set -x
var='some  thing'
echo python3 -c '#!/usr/bin/env python3
somevar = r"'"$var"'")';

The output of running echo python3 -c ... as asked in comments:

#!/bin/bash
set -x
var='some  thing'
echo python3 -c '#!/usr/bin/env python3
somevar = r"'"$var"'")';

# Output
$ bash test.sh
+ var='some  thing'
+ echo python3 -c '#!/usr/bin/env python3
somevar = r"some  thing")'
python3 -c #!/usr/bin/env python3
somevar = r"some  thing")

The output of running echo python3 -c ... | hexdump as asked in comments:

#!/bin/bash
set -x
var='some  thing'
echo python3 -c '#!/usr/bin/env python3
somevar = r"'"$var"'")' | hexdump;

# Outputs
+ var='some  thing'
+ echo python3 -c '#!/usr/bin/env python3
somevar = r"some  thing")'
+ hexdump
0000000 7970 6874 6e6f 2033 632d 2320 2f21 7375
0000010 2f72 6962 2f6e 6e65 2076 7970 6874 6e6f
0000020 0a33 6f73 656d 6176 2072 203d 2272 6f73
0000030 656d f020 879f f0af 879f 20b5 6874 6e69
0000040 2267 0a29
0000044

Related:

  1. Python 3: os.walk() file paths UnicodeEncodeError: 'utf-8' codec can't encode: surrogates not allowed
  2. How can I convert surrogate pairs to normal string in Python?
Evandro Coan
  • 8,560
  • 11
  • 83
  • 144
  • Annotate the encoding for Python? That said, can you reproduce this with a file as input? – Ulrich Eckhardt Dec 05 '21 at 20:47
  • @UlrichEckhardt , I updated the question. Piping the string into Python or writing it before to a file works without errors. My question now would be if it can be done with `python -c "$string"`. – Evandro Coan Dec 05 '21 at 20:50
  • What is the encoding of the files you are using? Maybe using `print(repr("your string"))` would also give some insight how the string is interpreted here. – Ulrich Eckhardt Dec 05 '21 at 20:53
  • @UlrichEckhardt , Running `print(repr(r"'"$var"'"))` just prints `'some thing'` in the console (when writing the string to a file before running it), – Evandro Coan Dec 05 '21 at 20:58
  • I have no real clue what may have caused this, apart from some encoding inconsistencies. Two more things I'd look at: Instead of `python -c ...`, I'd run `echo python -c ...` to find out how the commandline is finally assembled by the shell. Further, piping the output into `hd` or `hexdump` to inspect the bytewise output. However, some programs change behaviour, depending on whether they are writing to a TTY or a pipe, so this may stop the issue from happening. – Ulrich Eckhardt Dec 05 '21 at 21:05
  • @UlrichEckhardt I updated the question with the results. – Evandro Coan Dec 05 '21 at 21:19
  • Which OS are you on ? – Philippe Dec 06 '21 at 19:33
  • Windows 10 with `msys2`. I just tested with with Ubuntu 20, and it worked. I looks like a issue with `msys2` bash and python3. – Evandro Coan Dec 06 '21 at 20:32
  • Try `LANG=C.UTF-8 bash test.sh`. – hidekuro Jul 25 '22 at 02:55

0 Answers0