0

I have the following golang function to upload a file to SFTP:

func uploadObjectToDestination(sshConfig SSHConnectionConfig, destinationPath string, srcFile io.Reader) {
    // Connect to destination host via SSH
    conn, err := ssh.Dial("tcp", sshConfig.sftpHost+sshConfig.sftpPort, sshConfig.authConfig)
    if err != nil {
        log.Fatal(err)
    }
    defer conn.Close()

    // create new SFTP client
    client, err := sftp.NewClient(conn)
    if err != nil {
        log.Fatal(err)
    }
    defer client.Close()

    log.Printf("Opening file on destination server under path %s", destinationPath)
    // create destination file
    dstFile, err := client.OpenFile(destinationPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC)
    if err != nil {
        log.Fatal(err)
    }
    defer dstFile.Close()

    log.Printf("Copying file to %s", destinationPath)
    // copy source file to destination file
    bytes, err := io.Copy(dstFile, srcFile)
    if err != nil {
        log.Fatal(err)
    }

    log.Printf("%s - Total %d bytes copied\n", dstFile.Name(), bytes)
}

The code above works 95% of the cases but fails for some files. The only relation between this files which are failing is the size (3-4kb). The other files which succeed are smaller (0.5-3kb). In some cases files with size 2-3kb are failing as well.

I was able to reproduce the same issue with different SFTP servers.

When changing the failing code (io.Copy) with sftp.Write I can see the same behavior, except that the process does not return an error, instead I see that 0 bytes were copied, which seems to be the same like failing with io.Copy.

Btw, when using io.Copy, the error I receive is Context cancelled, unexpected EOF.

The code is running from AWS lambda and there is no memory or time limit issue.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Tamas Szasz
  • 386
  • 5
  • 6
  • Take a look at this talk : https://stackoverflow.com/questions/17714494/golang-http-request-results-in-eof-errors-when-making-multiple-requests-successi – Amit Baranes Oct 18 '19 at 11:04
  • I don't believe it's related. My function only sends a single file at a time, and there is a gap of 15 minutes between the sequences. – Tamas Szasz Oct 18 '19 at 12:04
  • Your code looks fine. The error unexpected EOF means one side or the other failed. For SFTP this probably means you have an unreliable Internet connection. For cases like this use a commercial/opensource product and compare behavior. – John Hanley Oct 18 '19 at 16:54
  • @JohnHanley I have tried different SFTP server including AWS Transfer which was sitting in the same region where my lambda was running with the client. Also, I don't see how internet connection could be a reason when talking about bytes or kilobytes of data. My lambda is running in a VPC with proper NAT gateway configurations and we handle millions of requests per hour in the same VPC. – Tamas Szasz Oct 19 '19 at 07:53
  • 1
    And the Internet is a perfect medium that never has errors. That assumption causes more bugs than I can count. You must design for failure and handle errors. The fact that your data is 1KB means nothing in the scope of designing software error handling. – John Hanley Oct 19 '19 at 16:51
  • @JohnHanley Believe me or not, I have built fault tolerant infrastructure, so I know the difference between "random" failures and consistent ones. I'm not saying my code can't be done better to handle failures. On other hand if a process consistently fails in a specific use case (for example I can't upload even after 20 retries) but almost always succeeds in other cases, something must be wrong there. In fact simple retry of the upload process does not help. Meantime, I will try to take out my lambda from the VPC to see if the issue is related to that. – Tamas Szasz Oct 20 '19 at 17:23
  • 1
    I respect what you are saying, but none of that matters. Your code looks fine. Never assume anything works, prove that it works and then move to the next possible item. Repeat until you know why this fails. – John Hanley Oct 20 '19 at 18:46

1 Answers1

0

After few hours of digging, it turns out, my code was the source of the issue. Here is the answer for future reference: There was another function not in the original question which downloads the object(s) from S3:

    func getObjectFromS3(svc *s3.S3, bucket, key string) io.Reader {
    var timeout = time.Second * 30
    ctx := context.Background()
    var cancelFn func()
    ctx, cancelFn = context.WithTimeout(ctx, timeout)
    defer cancelFn()
    var input = &s3.GetObjectInput{
        Bucket: aws.String(bucket),
        Key:    aws.String(key),
    }
    o, err := svc.GetObjectWithContext(ctx, input)
    if err != nil {
        if aerr, ok := err.(awserr.Error); ok && aerr.Code() == request.CanceledErrorCode {
            log.Fatal("Download canceled due to timeout", err)
        } else {
            log.Fatal("Failed to download object", err)
        }
    }
    // Load S3 file into memory, assuming small files
    return o.Body
  }

The code above is using context and for some reason, the object returned object size was wrong. Since I don't use contexts here I simply converted my code to use GetObject(input) which fixed the issue.

func getObjectFromS3(svc *s3.S3, bucket, key string) io.Reader {
    var input = &s3.GetObjectInput{
        Bucket: aws.String(bucket),
        Key:    aws.String(key),
    }

    o, err := svc.GetObject(input)
    if err != nil {
        if aerr, ok := err.(awserr.Error); ok {
            switch aerr.Code() {
            case s3.ErrCodeNoSuchKey:
                log.Fatal(s3.ErrCodeNoSuchKey, aerr.Error())
            default:
                log.Fatal(aerr.Error())
            }
        } else {
            // Print the error, cast err to awserr.Error to get the Code and
            // Message from an error.
            log.Fatal(err.Error())
        }
    }

    // Load S3 file into memory, assuming small files
    return o.Body
}
Tamas Szasz
  • 386
  • 5
  • 6