1

Here's the scenario:

  1. I'm parsing a log file and turning each individual line (string) from the log file into a hierarchical structure. I want to be able to categorize each message as belonging to a particular event on a particular day.
  2. I'm trying to send this data structure to the front-end of my Django app and display this hierarchical structure, and prevent the front-end from having to handle all these computations.
  3. On the front end I would like to be able to search by key words and display results that do this. Either I can search the data structure the back-end sends over, or I can search the rendered DOM.

I have the following data:

Day 1
   Event 1
      message
      message
      message
   Event 2
      message
      message
      message
   Event 3
      message
      message
      message
Day 2
   Event 1
      message
      message
      message
   Event 2
      message
      message
      message

...

The data

One event in the log file would look something like this:

2019-08-05 09:18:45 -- INFO -- all buttons -- THOR: All button were pressed.
2019-08-05 09:18:48 -- WARNING -- THOR1: The system failed to connect. Is the asset online? If so, did the password change?
2019-08-05 09:18:51 -- WARNING -- THOR2: The system failed to connect. Is the asset online? If so, did the password change?
2019-08-05 09:18:51 -- WARNING -- THOR3: Looks like it's online, but the system was unable to log in.
2019-08-05 09:18:51 -- WARNING -- THOR4: Looks like it's online, but the system was unable to log in.
2019-08-05 09:18:51 -- WARNING -- THOR5: Looks like it's online, but the system was unable to log in.
2019-08-05 09:18:52 -- WARNING -- THOR6: Looks like it's online, but the system was unable to log in.

My current approach

So far I have the data stored (extremely crudely) as

Day 1    Event 1               Event 2              Day 2     Event 1
  |         |                    |                    |          |
  |         |                    |                    |          |
  |  _______|                    |                    |  ________|
  | |                            |                    | |
[ [ [message, message, message], [message, message]], [ [message, message], ... ], ... ]

Maybe a better way?

{

   '08/05/2019': {
      '09:18': [message, message, message],
      '10:30': [message, message, message, message],
      '14:40': [message]
   }

   '08/03/2019': {
      '06:40': [message, message],
      '17:25': [message, message]
   }

}

Conclusion

I need to preserve order so I can show these in chronological order on the front end, but would a dict be more efficient for something like this? This might be viable since Python dicts now maintain insertion order.

Which data structure would be more efficient for storing and searching? I should note that I'll probably be dealing with around 60,000 messages.

Hunter
  • 646
  • 1
  • 6
  • 23
  • If your searching often than by far a `dict`. Is there a reason you think a `list` will be better? If you need something more than just insertion order you can also use an `OrderedDict`. – Error - Syntactical Remorse Aug 06 '19 at 13:17
  • @Error-SyntacticalRemorse I only think the `list` will be better because it maintains order on every level, but yes `dict` is way better for searching. I suppose I need to personally decide which one I want to compromise on. – Hunter Aug 06 '19 at 13:20
  • `O(n)` or `O(log n)` if sorted is a nasty bullet to bite compared to `O(1)`. Idk what you mean by maintains order on every level though (as dicts are insertion order as you noted). If you want the ability to sort than use an OrderedDict. – Error - Syntactical Remorse Aug 06 '19 at 13:23
  • @Error-SyntacticalRemorse You're right. At first I was having trouble deciding on what I should do, but I think after typing this post I understood the problem more and it now seems obvious which one I should choose. – Hunter Aug 06 '19 at 13:25
  • You have to define how and what (and where !) you are "searching". What is your front-end ? angular or react app ? Plain django templates ? – bruno desthuilliers Aug 06 '19 at 13:41
  • I would be inserting the data into lested lists into a plain django template, and then I would want to be able to filter the results by date and keywords using Javascript DOM manipulation, or maybe an AJAX call. – Hunter Aug 06 '19 at 13:43
  • 60K entries (whether in a dict or list or whatever) is a lot. You may want to consider using a proper model to store your data and search them (or using redis if you don't really care about persisting those data) and adding pagination so you don't have to render that many data. – bruno desthuilliers Aug 06 '19 at 13:46
  • Yeah I'll definitely use pagination, and I hadn't thought about storing it in Redis but that might be super useful since I would only have to parse the data once and then just render updates – Hunter Aug 06 '19 at 13:53
  • in your data example, you did not include the event. An event will need to be added tot he tree structure for grouping. Are you calling the time, the event? – Golden Lion Jan 30 '21 at 13:09

1 Answers1

0
mydict={

   '08/05/2019': {
        '09:18': ["message1","message2","message3"],
        '10:30': ["message4","message5","message6","message7"],
        '14:40': ["message8"]
 },

 '08/03/2019': {
  '06:40': ["message9","message10"],
  '17:25': ["message11","message12"]
   }

}
df=pd.DataFrame(mydict)

df=df.T
print(df.head())
print(df.columns)

columns=df.columns

for key, item in df.iterrows():
    events=[]    
    [events.append({'date':key,'event': column, 'messages':item[column]}) for column in columns ]
    print(events)

  output

  [None, None, None, None, None]
  [{'date': '08/05/2019', 'event': '09:18', 'messages': ['message1', 'message2', 'message3']}, {'date': '08/05/2019', 'event': '10:30', 'messages': 
  ['message4', 'message5', 'message6', 'message7']}, {'date': '08/05/2019', 'event': '14:40', 'messages': ['message8']}, {'date': '08/05/2019', 'event': '06:40', 'messages': nan}, {'date': '08/05/2019', 'event': '17:25', 'messages': nan}]
[None, None, None, None, None]
[{'date': '08/03/2019', 'event': '09:18', 'messages': nan}, {'date': '08/03/2019', 'event': '10:30', 'messages': nan}, {'date': '08/03/2019', 'event': '14:40', 'messages': nan}, {'date': '08/03/2019', 'event': '06:40', 'messages': ['message9', 'message10']}, {'date': '08/03/2019', 'event': '17:25', 'messages': ['message11', 'message12']}]
Golden Lion
  • 3,840
  • 2
  • 26
  • 35