I want to use boto3 to run a command on an ECS Fargate container which generates a lot of binary output, and stream that output into a file on my local machine.
My attempt is based on the recommendation here, and looks like this:
import json
import uuid
import boto3
import construct as c
import websocket
# Define Structs
AgentMessageHeader = c.Struct(
"HeaderLength" / c.Int32ub,
"MessageType" / c.PaddedString(32, "ascii"),
)
AgentMessagePayload = c.Struct(
"PayloadLength" / c.Int32ub,
# This only works with my test command. It won't work with my real command that returns binary data
"Payload" / c.PaddedString(c.this.PayloadLength, "ascii"),
)
# Define initial payload
init_payload = {
"MessageSchemaVersion": "1.0",
"RequestId": str(uuid.uuid4()),
"TokenValue": session["tokenValue"],
}
# Define the container you want to talk to
cluster = "..."
task = "..."
container = "..."
# Send command with large response (large enough to span multiple messages)
result = client.execute_command(
cluster=cluster,
task=task,
container=container,
# This is a sample command that returns text. My real command returns hundreds of megabytes of binary data
command="python -c 'for i in range(1000):\n print(i)'",
interactive=True,
)
# Get session info
session = result["session"]
# Create websocket connection
connection = websocket.create_connection(session["streamUrl"])
try:
# Send initial response
connection.send(json.dumps(init_payload))
while True:
# Receive data
response = connection.recv()
# Decode data
message = AgentMessageHeader.parse(response)
payload_message = AgentMessagePayload.parse(response[message.HeaderLength:])
if 'channel_closed' in message.MessageType:
raise Exception('Channel closed before command output was received')
# Print data
print("Header:", message.MessageType)
print("Payload Length:", payload_message.PayloadLength)
print("Payload Message:", payload_message.Payload)
finally:
connection.close()
This almost works, but has a problem - I can't tell when I should stop reading.
If you read the final message from aws, and call connection.recv()
again, aws seems to loop around and send you the initial data - the same data you would have received the first time you called connection.recv()
.
One semi-hackey way to try to deal with this is by adding an end marker to the command. Sort of like:
result = client.execute_command(
...
command="""bash -c "python -c 'for i in range(1000):\n print(i)'; echo -n "=== END MARKER ===""""",
)
This idea works, but to be used properly, becomes really difficult to use. There's always a chance that the end marker text gets split up between two messages, and dealing with that becomes a pain, since you can no longer write a payload immediately to disk until you verify that the end of the payload, along with the beginning of the next payload, isn't your end marker.
Another hackey way is to checksum the first payload, and every subsequent payload, comparing the checksum of each payload to the checksum of the first payload. That will tell you if you've looped around. Unfortunately, this also has a chance of having a collision, if the binary data in 2 messages just happens to repeat, although the chances of that in practice would probably be slim.
Is there a simpler way to determine when to stop reading? Or better yet, a simpler way to have boto3 give me a stream of binary data from the command I ran?