Atom update hangs inside of Clojure watch call

Question

I've got a situation where I watch a specific directory for filesystem changes. If a certain file in that directory is changed, I re-read it, attach some existing cached information, and store it in an atom.

The relevant code looks like

(def posts (atom []))

(defn load-posts! []
  (swap!
   posts
   (fn [old]
     (vec
      (map #(let [raw (json/parse-string % (fn [k] (keyword (.toLowerCase k))))]
              (<snip some processing of raw, including getting some pieces from old>))
           (line-seq (io/reader "watched.json")))))))


;; elsewhere, inside of -main
(watch/start-watch
    [{:path "resources/"
      :event-types [:modify]
      :callback (fn [event filename]
                  (when (and (= :modify event) (= "watched.json" filename))
                    (println "Reloading posts.json ...")
                    (posts/load-posts!)))}
     ...])

This ends up working fine locally, but when I deploy it to my server, the swap! call hangs about half-way through.

I've tried debugging it via println, which told me

The filesystem trigger is being fired.
swap! is not running the function more than once
The watched file is being opened and parsed
Some entries from the file are being processed, but that processing stops at entry 111 (which doesn't seem to be significantly different from any preceding entries).
The update does not complete, and the old value of that atom is therefore preserved
No filesystem events are fired after this one hangs.

I suspect that this is either a memory issue somewhere, or possibly a bug in Clojure-Watch (or the underlying FS-watching library).

Any ideas how I might go about fixing it or diagnosing it further?

Seems likely that an error is occurring during processing that is getting swallowed. Turn the in-line function being mapped onto the line-seq into separate individual functions and test them on the relevant entry. — Jonah Benton, Sep 08 '16 at 15:35
... or put a try/catch around the call to `load-posts!` and see what's getting thrown. — Alex, Sep 08 '16 at 18:45

Inaimathi · Accepted Answer · 2016-09-11T14:26:18.127

1

The hang is caused by an error being thrown inside of the function passed as a :callback to watch/start.

The root cause in this case is that the modified file is being copied to the server by scp (which is not atomic, and the first event therefore triggers before the copy is complete, which is what causes the JSON parse error to be thrown).

This is exacerbated by the fact that watch/start fails silently if its :callback throws any kind of error.

The solutions here are

Use rsync to copy files. It does copy atomically but it will not generate any :modify events on the target file, only related temp-files. Because of the way its atomic copy works, it will only signal :create events.
Wrap the :callback in a try/catch, and have the catch clause return the old value of the atom. This will cause load-posts! to run multiple times, but the last time will be on file copy completion, which should finally do the right thing.

(I've done both, but either would have realistically solved the problem).

A third option would be using an FS-watching library that reports errors, such as Hawk or dirwatch (or possibly hara.io.watch? I haven't used any of these, so I can't comment).

Diagnosing this involved wrapping the :callback body with

(try 
  <body> 
  (catch Exception e 
    (println "ERROR IN SWAP!" e) 
    old))

to see what was actually being thrown. Once that printed a JSON parsing error, it was pretty easy to gain a theory of what was going wrong.

edited Sep 11 '16 at 14:26

answered Sep 08 '16 at 19:12

Inaimathi

13,853
9
49
93

1

Couple of comments on the answer- scp makes no attempt to be atomic, but rsync does. See http://rsync.samba.org/how-rsync-works.html and http://stackoverflow.com/questions/3769263/are-rsync-operations-atomic-at-file-level#6903839. So one can often watch for the "create" event, which is fired when rsync performs the mv of a temporary direntry to the permanent one. – Jonah Benton Sep 09 '16 at 02:12
The start-watch function returns a future, but it wasn't being dereferenced. There are two other active watch projects, hawk and dirwatch; both of those appear to .printStackTrace any exceptions instead. – Jonah Benton Sep 09 '16 at 02:35
@JonahB - I started with `rsync`, but switched to `scp` because I wasn't seeing `:modify` events with it. I'll test to see if the `:create` event gets picked up, and update my answer if it does (that way requires a lot less hacking around). – Inaimathi Sep 09 '16 at 03:38
Gotcha, it sounds like there is a workflow? That perhaps the processed posts and other files should be deleted on the receipt side after processing? – Jonah Benton Sep 09 '16 at 11:10
@JonahB - Kinda? It's a very minimal blog engine I wrote for myself. What I want is to be able to just copy `md`/`json` files up to my server and get them published without mucking about with restarting a server. The idea is that this manages cached posts by watching what files change. This entire thing could be avoided by using a static-file middleware for `http-kit`, I just wanted to write it myself. And yes, watching for `:create` signals from `rsync` works perfectly. – Inaimathi Sep 09 '16 at 20:57

Atom update hangs inside of Clojure watch call

1 Answers1