13

In order to better understand how websockets are used beyond the basic hello-world, I set myself the task of getting some data from a page using websockets and JSON (because the source code of gitxiv is readily available, I chose to look at http://gitxiv.com/day/2015/12/31).

Connecting to this websocket via Python seems to be straightforward

from websocket import create_connection
import websocket
import pprint

websocket.enableTrace(True)
ws=create_connection("ws://gitxiv.com/sockjs/212/2aczpiim/websocket")
result = ws.recv()
print "Received '%s'" % result
result = ws.recv()
print "Received '%s'" % result

I'm not entirely clear about the variables in the ws:// url, like '212'. Running this code seems to reliably connect (although it is always possible that failing to have the right variables in there causes the server to refuse to cooperate later?)

Now if I watch the communication between Firefox and the gitxiv page, I see that following connection of the websocket the server sends

o
a["{\"server_id\":\"0\"}"]

The above script gets the same response, so it seems that the connection is successfully made.

However, this is where I stumble. The next step in the communication is that my browser sends quite a lot of information to the web service, such as the line:

"["{\"msg\":\"connect\",\"version\":\"1\",\"support\":[\"1\",\"pre2\",\"pre1\"]}"]"

Sending these lines directly using ws.send() results in 'broken framing'. Sending just:

controlstr='{"msg":"connect","version":"1","support":["1","pre2","pre1"]}';
ws.send(controlstr)

results in something being sent that looks like:

send: '\x81\xbd\xef\x17F8\x945+K\x885|\x1a\x8cx(V\x8at2\x1a\xc350]\x9dd/W\x815|\x1a\xde5j\x1a\x9cb6H\x80e2\x1a\xd5Ld\t\xcd;dH\x9drt\x1a\xc356J\x8a&de\x92'

I get a different error:

'a["{\\"msg\\":\\"error\\",\\"reason\\":\\"Bad request\\"}"]'

It seems, therefore, that there is something wrong in the way that I am sending this JSON message to the websocket. Does anybody know what format it expects, and how to achieve it using websocket/websocket-client? Any clarification/suggestions would be most welcome.

The JSON messages I am looking to send are those that Firefox's Websocket developer tool reports: here is a screenshot:

Firefox Web Developer Tool report

Soz
  • 957
  • 1
  • 5
  • 9
  • the 212 ist just part of the URL and thus kinda arbitrary. From where do you get that send string? Also I would rather generate the datastructure and then transform it to json via json.dumps. I think these messages are from the meteor framework, so you could look there for a protocol. But you should verify they are really from meteor. – syntonym Jun 06 '16 at 15:12
  • I got the send string simply by using the Developer Tools websocket extension to find out what traffic is sent by Firefox, following the philosophy that saying the same things ought to lead to the same result. The 212 does vary (it gets a new url each time), but I'm not sure what effect that has. Sounds quite possible re meteor - thank you! – Soz Jun 06 '16 at 15:30
  • Ah sorry I meant the '\x81\xbd\xef\x17F8...' Also it seems like you really need to sent a json list consisting of a single string (which is then again valid json). Be sure to properly escape the the quotationmarks. – syntonym Jun 06 '16 at 15:36
  • That strange \x81 etc is what Python 'says' it's printing out (the line websocket.enableTrace(True) causes it to give detail of each message sent and received). ws.send() apparently does this conversion internally. – Soz Jun 06 '16 at 15:48
  • It seems like the websocket client does some binary stuff with the payload and that's what is printed. Unless you want to dive into websocket specifics I don't think that output is helpful. Werner implemented the "array with a single string which is valid json" below which looks promising. – syntonym Jun 06 '16 at 15:56
  • Werner's approach does indeed work. You are quite right, that output is probably just misleading. – Soz Jun 06 '16 at 16:04

1 Answers1

15

If you look closely at what what's sent through the browser, notice that it's:

["{\"msg\":\"connect\"}"]

This looks an awful lot like an array of JSON strings. Indeed, if you try to replicate it:

ws.send(json.dumps([json.dumps({'msg': 'connect', 'version': '1', 'support': ['1', 'pre2', 'pre1']})]))

You'll see that you get connected. Here's my entire code:

import json
import pprint
import websocket
from websocket import create_connection

websocket.enableTrace(True)
ws = create_connection('ws://gitxiv.com/sockjs/212/2aczpiim/websocket')

result = ws.recv()
print('Result: {}'.format(result))

result = ws.recv()
print('Result: {}'.format(result))

ws.send(json.dumps([json.dumps({'msg': 'connect', 'version': '1', 'support': ['1', 'pre2', 'pre1']})]))
result = ws.recv()
print('Result: {}'.format(result))
Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
  • I had tried one json.dumps without effect. It did not occur to me to nest them, as in json.dumps([json.dumps({stuff})]). That's a wonderful and mysterious solution! (Do you happen to know if this is a common recipe?) – Soz Jun 06 '16 at 15:58
  • 1
    @Soz Personally I have not seen this before. Intuitivly I would have assumed that actual json in an array makes more sense (e.g. [{msg1}, {msg2}]), but maybe the developer had their reasons. – syntonym Jun 06 '16 at 20:29
  • 1
    Honestly, it seems a wee bit insane to me. The *only* reason that I can come up with is that they have a web client that could potentially send multiple messages at a time, and specifying that each message must be JSON encoded. It's really pretty weird though, because I would have just specified that the messages are a JSON encoded array of messages, e.g. `json.dumps([{'msg': 'connect'}, {'msg': 'frobnosticate'}])`, rather than a JSON encoded list of JSON encoded objects. I'm not sure I can come up with a *good* reason (though I can invent a few other reasons that make "make sense") – Wayne Werner Jun 07 '16 at 16:13
  • Seems bloody insane to me – Michael Paccione Jul 08 '22 at 03:58