1

This is rather a design problem. I don't know how to achieve this in Akka

User Story
- I need to parse big files (> 10 million lines) which look like

2013-05-09 11:09:01 Local4.Debug    172.2.10.111    %MMT-7-715036: Group = 199.19.248.164, IP = 199.19.248.164, Sending keep-alive of type DPD R-U-THERE (seq number 0x7db7a2f3)
2013-05-09 11:09:01 Local4.Debug    172.2.10.111    %MMT-7-715046: Group = 199.19.248.164, IP = 199.19.248.164, constructing blank hash payload
2013-05-09 11:09:01 Local4.Debug    172.2.10.111    %MMT-7-715046: Group = 199.19.248.164, IP = 199.19.248.164, constructing qm hash payload
2013-05-09 11:09:01 Local4.Debug    172.2.10.111    %ASA-7-713236: IP = 199.19.248.164, IKE_DECODE SENDING Message (msgid=61216d3e) with payloads : HDR + HASH (8) + NOTIFY (11) + NONE (0) total length : 84
2013-05-09 11:09:01 Local4.Debug    172.22.10.111   %MMT-7-713236: IP = 199.19.248.164, IKE_DECODE RECEIVED Message (msgid=867466fe) with payloads : HDR + HASH (8) + NOTIFY (11) + NONE (0) total length : 84
  • For each line I need to generate some Event that will be sent to server.

Question
- How can I read this log file efficiently in Akka model? I read that reading a file synchronously is better because of less magnetic tape movement.
- In that case, there could be FileReaderActor per file, that would read each line and send them for processing to lets say EventProcessorRouter and Router may have many actors working on line (from file) and creating Event. There would be 1 Event per line
- I was also thinking of sending Events in batch to avoid too much data transfer in network. In such cases, where shall I keep accumulating these Events? and How would I know if I all Events are generated from inputFile?

Thanks

daydreamer
  • 87,243
  • 191
  • 450
  • 722

1 Answers1

0

I think I know what your asking, your basically saying that if you read and proccess a file in the mannor you are describing you risk having a massive amount of messages if the proccessing takes significantly longer than the reading. Also if you are messaging over the network ideally you would want to minimize the amount of messages to send. If your lines don't take long to process then I wouldn't send them to be processed over the network. Have you considered using futures instead? Don't know if you case is as simple as Parallel File Processing: What are recommended ways? in that case you should use streams. But I think the thing is with actors although they are good for throttling their main purpose is to wrap up state, and you don't have that so much with proccessing a file. Maybe you would be better off with futures, I show an example of that here Executing Dependent tasks in parallel in Java. But you could use actors like you say and have the processing actors communicate with the reader actor and tell it to stop reading for lets say a second as soon as the number of messages waiting to be processed exceeds 1000000 or however many you decide.

Community
  • 1
  • 1
Derrops
  • 7,651
  • 5
  • 30
  • 60