0
let addr: SocketAddr = self.listen_bind.parse().unwrap();
let mut listener = TcpListener::bind(&addr).await?;
info!("Nightfort listening on {}", addr);
loop {
    info!("debug1");
    match listener.accept().await {
        Ok((stream, addr)) => {
            info!("debug2");
            let watcher = self.watcher.clone();
            info!("debug3");
            tokio::spawn(async move {
                info!("debug4");
                if let Err(e) = Nightfort::process(watcher, stream, addr).await {
                    error!("Error on this ranger: {}, error: {:?}", addr, e);
                }
            });
        }
        Err(e) => error!("Socket conn error {}", e),
    }
    // let (stream, addr) = listener.accept().await?;
}

I spent two days on troubleshooting this weird issue. The process in rust can run very well on my local macos, linux, docker on linux, but can not run on aws linux or k8s on aws. The main issues I found is: the process hang on accept() even a client thinks it established a connection to the server and started sending messages to it. ps show the the server process is in S status. The code was written in nightly rust with alpha libs, and I thought there might be a bug in the dependency, then I updated my code and switch it to stable rust with the latest release of dependencies, but the issue is still there.

Ömer Erden
  • 7,680
  • 5
  • 36
  • 45
Stefan Liu
  • 53
  • 1
  • 5
  • Have you tried handling connections by doing no operation? We don't know the details of `Nightfort::process`, it might be running longer than expected on `aws linux`. At the end this can cause a deadlock in tokio executor please see : https://stackoverflow.com/questions/48735952/why-does-futureselect-choose-the-future-with-a-longer-sleep-period-first – Ömer Erden Feb 21 '20 at 08:29
  • @ÖmerErden The execution looks only reached "debug1", I never see debug 2, 3, 4 or socket error message when running on aws. But looks good on my local computer – Stefan Liu Feb 21 '20 at 09:19

0 Answers0