The data stored on S3 corresponds to the structure known as StreamerMessage
, which reflects the data from the nearcore
node but is not equivalent to it.
I started documenting the StreamerMessage
some time ago, but I haven't had the opportunity to finish it yet. You might find it interesting: StreamerMessage Documentation
To put it simply, the purpose of the StreamerMessage
is to represent the data from the node, specifically tailored for indexer developers. In order to facilitate developers' work, we have introduced several additional structures, such as:
IndexerExecutionOutcomeWithReceipt
IndexerExecutionOutcomeWithOptionalReceipt
IndexerChunk
- and so on.
These structures do not have an exact counterpart in the nearcore
primitives.
The answer to your question depends on how you define "raw data."
Regarding blocks, transactions, and receipts, these entities can still be extracted from the StreamerMessage
. However, if you require a comprehensive list of the "transformations," you would need to examine the source code of the Indexer Framework, which can be found here.
The data is stored in S3 by the Lake Indexer, with the only transformation being the splitting of the entire StreamerMessage
into files named block.json
and shard_N.json
.
P.S. It seems you're diving into an interesting topic that might be useful to other developers, feel free to open new questions, thus we would have more info about it on SO.