0

I host some RSS Files on my S3 Bucket. This Files are currently public. Because i want to use this RSS-Files in my Flashbriefing Alexa Skills.

Using Flashbriefing Alexa Skills, i doesn't have a Lambda Function! (as in other kind of skills). In an Flashbriefing Skill i only refer to the XML File.

But since I set my files to "public", I get thousands of requests for these files. My Skill doesn't have so many requests, so i'm sure that 99% of the get requests are not from alexa. Maybe webcrawler or something like that. This is running into costs.

Can i restrict the access to the files to my Alexa Skill? Maybe via Policy in the Bucket?

Kind Regards Stefan

Stefan Volkmer
  • 318
  • 2
  • 13
  • You should probably start by checking your [S3 bucket logs](https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html) to see if you can identify the source of the traffic. The User Agent, Referer, Remote IP, and Requester (or absence of one) should be informative. Once the source is identified, *then* you are better informed on how you might want to mitigate this. – Michael - sqlbot Mar 04 '18 at 14:15
  • I already checked the logs, but couldn't find any helpfully information. IP are sometimes identically, but most of the time different. e.g. (205.251.233.47, 72.21.217.31, 10.89.82.142, 10.81.172.112). Its always an REST.GET.OBJECT call no Referrer => "-" and User-Agent is always: "Apache-HttpClient/UNAVAILABLE" – Stefan Volkmer Mar 04 '18 at 17:58
  • 1
    What value did you choose for `Content update frequency` on your skill? I wonder if the Alexa infrastructure is trying to maintain a fresh copy, but being overly aggressive. How recently have you updated your files? Those first 2 IPs are listed as "Amazon" but not associated with a specific service (Not EC2, CloudFront, or Route53 Checkers), one in us-east-1, the other in us-west-2, and the others, 10.x.x.x are necessarily internal, suggesting perhaps access via a VPC endpoint. `Apache-HttpClient/UNAVAILABLE` is a generic Java UA string. Nothing here to rule out Alexa doing this, it seems. – Michael - sqlbot Mar 04 '18 at 20:01
  • Thanks for this helpfully hints! Where did you see that the ips are from amazon? would be interresting. my update frequency is "Hourly". but i thought the rss file is always read in "real-time"? when the skill is called? or not? I have a batch-job (every 5 hours), it reads an original rss file from a local provider. the job makes some adjustment/improvements for voice, and creates an improved copy. in my skill i reference to this "copy". => it's possible that the file get new content multiple times/day. maybe i'll change to "daily" and check the difference for the end-user – Stefan Volkmer Mar 04 '18 at 20:47
  • I was using my database of all AWS IP address ranges that I periodically update using the information at https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html. I would assume it would be fetched in real time, too, but it could go either way and doesn't seem to be explicitly documented... but since they ask about update intervals, that seems to imply they might be polling it. – Michael - sqlbot Mar 04 '18 at 23:27

0 Answers0