Problems decoding old-school BBS ANSI with telnetlib

Question

Was trying to use telnetlib to help me play TradeWars on an old-school BBS.

As I recall (this was 30+ years ago) BBS's used some kind of extended ASCII and/or something called ANSI. Made for colorful text and some simple graphics, like edges, corners, etc.

A generic telnet terminal, like gnu telnet, cannot render these sites correctly. SyncTerm (ancient software) kind of runs on my mac and does display the text and graphics properly.

My problem is that telnetlib.read_until() returns bytes which i am unable to decode into something readable.

a fragment of my login reading/writing:

print (tt.read_until("Show today's log?".encode('ascii')),3)
tt.write('\r\n'.encode('ascii'))
print (tt.read_until('[Pause]'.encode('ascii')),3)
tt.write('\r\n'.encode('ascii'))
print (tt.read_until('Password?'.encode('ascii')),3)
tt.write('NOTMYPASSWORD\r\n'.encode('ascii'))
zaa = tt.read_until('[Pause]'.encode('ascii'))
print(zaa)
tt.write('\r\n'.encode('ascii'))

read_until() is giving me bytes.

print (type(zaa)) zaa.decode('utf-8')

<class 'bytes'> ' ****\r\x1b[0m\n\r\n\x1b[1;33mYou have been on today.\r\x1b[0m\n\x1b[32mSearching for messages received since your last time on\x1b[1;33m:\r\x1b[0m\n\x1b[32mNo messages received.\r\x1b[0m\n\x1b[35m[Pause]'

But I don't know how to decode them into something nicer to read, or at least just strip out all the awful color-control codes or whatever they are.

Any advice on how to parse this nicely?

Thanks

==========

Here is what I've come up with for now. But still wondering if there are better solutions.

def de_ansi(somebytes):
    #https://stackoverflow.com/questions/13506033/filtering-out-ansi-escape-sequences
    #https://stackoverflow.com/questions/14693701/how-can-i-remove-the-ansi-escape-sequences-from-a-string-in-python
    ansi_escape = re.compile(r'\x1B\[[0-?]*[ -/]*[@-~]')
    text =  ansi_escape.sub('', somebytes.decode('utf-8'))
    return text.splitlines()

This doesn't work because there are some non utf-8 codes that come up: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 265: invalid continuation byte

If I decode as Latin-1 no errors are thrown, but the output is awful:

['ú.   Üßßß Ûßßß  ßÛß        ÍÞÍÞðÞÍÞÍÞÍÞðÞ', 'ÞÍÞ  ..ÜßßÜÛÜ      ßÛÛÛÜÜ   ÛÛÛÍÞÍÞÍÞúÞÍÍÍÞÍÞ', 'ÞðÞ   Visit : telnet://mtlgeek.synchro.netÛÛÛÛ ÜÛÜ      ßÛÛÛÜÜ ÛßÍÍðÍËÞÍÞÍËÍÞÍÞ', 'ÞÍ¹   ÜßÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÛÜßßßÜÜÍÞÍÞÊÞÍÞÍ¹ðÞÍÞ', 'ÞÍÞÍÞÉÞÍðúðÍÞ»ÞðÞÍÞÉÞÍÞúÞðÍÍÍðÞÍÞðÞÍÞÍÞðÞÍÍÍÞÍÞÍÞúÞÍðÍÞÍÞÍÞÍÞðÞÍÞÍÞÍÞÍÎÍÞÍÞÍÞúATrade Wars 2002 Win32 module now loading.', '', 'Mearratwe', 'tcfho SMearra', 'twetcfho SMea', 'rratwetcfho ', 'SMartech SoftwareMartech SoftwareMartech Software', 'Martech SoftwareMartech SoftwareShocfe', 'ttwraarMeSh', 'ocfettwraarM', 'e             ', '  psrtensepsrt', 'ensepsrtensepresen', 'tspresentspresentspresentspresentspresentse', 'snetrspseenrtp', 's              ', '  úúúúúúú', '.ßÜ°°°°±°°±±°  ²±°±°°±°   °    °    °±±°±±°±', '

There are two separate problems here. One is getting Python to process ANSI x3.64 display codes, and the other is to emulate whichever legacy 8-bit character set the remote system was hardcoded to use. — tripleee, Apr 14 '19 at 14:56
I have marked a couple of duplicates. The second blindly assumes your input is in code page 850 but it should at least give you an idea of how to solve the mapping problem (maybe try cp437 if 850 is wrong). — tripleee, Apr 14 '19 at 15:02
Maybe see also https://tripleee.github.io/8bit/ if you are not able to guess the correct encoding. A few bytes for which you know or can guess what character they represent is usually enough to identify one or two good character set candidates. — tripleee, Apr 14 '19 at 15:08
@tripleee cp850 was an improvement, cp437 converted the characters best. Thanks. — user3556757, Apr 15 '19 at 07:33

Problems decoding old-school BBS ANSI with telnetlib

0 Answers0