Here's the scenario:
- I'm parsing a log file and turning each individual line (string) from the log file into a hierarchical structure. I want to be able to categorize each message as belonging to a particular event on a particular day.
- I'm trying to send this data structure to the front-end of my Django app and display this hierarchical structure, and prevent the front-end from having to handle all these computations.
- On the front end I would like to be able to search by key words and display results that do this. Either I can search the data structure the back-end sends over, or I can search the rendered DOM.
I have the following data:
Day 1
Event 1
message
message
message
Event 2
message
message
message
Event 3
message
message
message
Day 2
Event 1
message
message
message
Event 2
message
message
message
...
The data
One event in the log file would look something like this:
2019-08-05 09:18:45 -- INFO -- all buttons -- THOR: All button were pressed.
2019-08-05 09:18:48 -- WARNING -- THOR1: The system failed to connect. Is the asset online? If so, did the password change?
2019-08-05 09:18:51 -- WARNING -- THOR2: The system failed to connect. Is the asset online? If so, did the password change?
2019-08-05 09:18:51 -- WARNING -- THOR3: Looks like it's online, but the system was unable to log in.
2019-08-05 09:18:51 -- WARNING -- THOR4: Looks like it's online, but the system was unable to log in.
2019-08-05 09:18:51 -- WARNING -- THOR5: Looks like it's online, but the system was unable to log in.
2019-08-05 09:18:52 -- WARNING -- THOR6: Looks like it's online, but the system was unable to log in.
My current approach
So far I have the data stored (extremely crudely) as
Day 1 Event 1 Event 2 Day 2 Event 1
| | | | |
| | | | |
| _______| | | ________|
| | | | |
[ [ [message, message, message], [message, message]], [ [message, message], ... ], ... ]
Maybe a better way?
{
'08/05/2019': {
'09:18': [message, message, message],
'10:30': [message, message, message, message],
'14:40': [message]
}
'08/03/2019': {
'06:40': [message, message],
'17:25': [message, message]
}
}
Conclusion
I need to preserve order so I can show these in chronological order on the front end, but would a dict be more efficient for something like this? This might be viable since Python dicts now maintain insertion order.
Which data structure would be more efficient for storing and searching? I should note that I'll probably be dealing with around 60,000 messages.