2

I am trying out a sample common data crawl example based on https://engineeringblog.yelp.com/2015/03/analyzing-the-web-for-the-price-of-a-sandwich.html

I am running this below command in my local windows PC based on the instructions.

python mr_crawl_phonenumbers.py -r emr s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2014-52/wet.paths.gz s3://yelp/business_data.txt

But I am getting the below error.

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied –

Could someone help me with this.

Shamnad P S
  • 1,095
  • 2
  • 15
  • 43
  • Where did you put your AWS credentials? – realharry Nov 27 '17 at 17:31
  • Dummy check : did you open AWS account and deploy aws client as mentioned here : http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html – mootmoot Nov 27 '17 at 17:35
  • @realharry I have done this as per the guide. – Shamnad P S Nov 27 '17 at 17:42
  • @mootmoot I have done this based on the guide. – Shamnad P S Nov 27 '17 at 17:42
  • 4
    It seems AWS change the bucket structure. The document you refer to is outdated. The common crawl bucket is call `s3://commoncrawl` now. Try to use `aws s3 ls s3://commoncrawl` to check it out. Ref : https://aws.amazon.com/public-datasets/common-crawl/ – mootmoot Nov 27 '17 at 17:57
  • @mootmoot I tried this, not working. – Shamnad P S Nov 28 '17 at 06:38
  • How about `aws s3 ls s3://commoncrawl --region us-east-1` . In addition, you must have a valid AWS access key, you must go IAM to create one that assign with S3 access policy – mootmoot Nov 28 '17 at 08:10
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/159984/discussion-between-shamnad-p-s-and-mootmoot). – Shamnad P S Nov 28 '17 at 11:14

0 Answers0