0

I am in the process of creating a script that grabs my s3 data to my local machine. Typically the data I am receiving is that of a hive partition. I am receiving a No such file or directory error even though the file does exist. Can someone explain what I am doing wrong and how I should approach this differently? Here is the piece of code that the error references:

bucket = conn.get_bucket(bucket_name)
for sub in bucket.list(prefix = 'some_prefix'):
        matched = re.search(re.compile(read_key_pattern), sub.name)
        if matched:
            with open(sub.name, 'rb') as fin:
                reader = csv.reader(fin, delimiter = '\x01')
                contents = [line for line in reader]
            with open('output.csv', 'wb') as fout:
                writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\\')
                writer.writerows.content

IOError: [Errno 2] No such file or directory: 'my_prefix/54c91e35-4dd0-4da6-a7b7-283dff0f4483-000000'

The file exists and that is the correct folder and file that I am trying to retrieve.

gold_cy
  • 13,648
  • 3
  • 23
  • 45
  • There doesn't seem to be an extension on that file name e.g. `.txt`? – roganjosh Feb 13 '17 at 16:59
  • Sure of your current directory? – Jean-François Fabre Feb 13 '17 at 17:01
  • Yes the file error is pointing to the correct file, as for the extension, it does not have one, at least not one I can see visibly. I downloaded the file locally and the code worked on it that way – gold_cy Feb 13 '17 at 17:04
  • There's limited scenarios for this error. "I downloaded the file locally and the code worked on it that way" so if you have it in the same directory as the script, it works? – roganjosh Feb 13 '17 at 17:08
  • That is correct. I am trying to understand why this error is telling me the file does not exist when the error message is pointing to the correct file path @roganjosh – gold_cy Feb 13 '17 at 17:09
  • In which case, `my_prefix/` is assuming that there's a subdirectory inside whatever directory you're running the code from called `my_prefix` and it's looking for the file in there. You haven't shown your directory structure but I assume that the issue is there. – roganjosh Feb 13 '17 at 17:14

1 Answers1

1

Like @roganjosh said, it looks like you haven’t downloaded the file after you tested for the name match. I’ve added comments below to show you how to process the file in-memory in python 2:

    from io import StringIO # alternatively use BytesIO
    import contextlib

    bucket = conn.get_bucket(bucket_name)
    # use re.compile outside of the for loop
    # it has slightly better performance characteristics
    matcher = re.compile(read_key_pattern)

    for sub in bucket.list(prefix = 'some_prefix'):
        # bucket.list returns an iterator over s3.Key objects
        # so we can use `sub` directly as the Key object
        matched = matcher.search(sub.name)
        if matched:
            # download the file to an in-memory buffer
            with contextlib.closing(StringIO()) as fp:
                sub.get_contents_to_file(fp)
                fp.seek(0)
                # read straight from the memory buffer
                reader = csv.reader(fp, delimiter = '\x01')
                contents = [line for line in reader]
            with open('output.csv', 'wb') as fout:
                writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\\')
                writer.writerows.content    

For python 3 you will need to change the with statement as discussed in the comments to the answer for this question.

Community
  • 1
  • 1
2ps
  • 15,099
  • 2
  • 27
  • 47
  • I'm thinking that OP has supplied the relative path that is incorrect. We don't know what `my_prefix` is but I guess it's not valid either as a relative or absolute path. – roganjosh Feb 13 '17 at 17:54
  • 1
    It looks like (from the code) that `my_prefix` is the directory in S3 (i.e., the S3 key name is `my_prefix/SOME_GUID`). So he just forgot to download the file from S3 (an important step, no doubt). – 2ps Feb 13 '17 at 17:55
  • This is correct. I was assuming that I did not need to download first, that I would be able to just open them one by one using the `csv` module. Thanks! – gold_cy Feb 13 '17 at 18:26
  • Just wanted to add one comment. I had to change mine to `BytesIO` due to encoding. So if anyone arriving here in the future runs in to `TypeError` just change from `StringIO` to `BytesIO` – gold_cy Feb 13 '17 at 21:06