
Can S3 bucket be specified at AMI invocation?
Reported by Brian Pratt | April 23rd, 2009 @ 03:37 PM
I already have an S3 bucket containing mass spec data, it would be cool if I could use that instead of the one VipDAC created for me on initial use. Maybe the AMI could be invoked as aws_usage=xxx,aws_secret=yyy,aws_bucket=mybukkit
Again, really nice work here!
Comments and changes to this ticket
-
jgeiger April 23rd, 2009 @ 03:50 PM
- Tag set to bucket-copy, bucket-name, s3
- State changed from new to open
Thanks for the suggestion. You can set a folder name by passing folder=name when you supply your keys. It will still enforce specific naming conditions on that bucket though.
For example if you pass folder=cheese, the bucket would be named cheese-accesskey-vipdac. This is done to prevent naming conflicts since every bucket name on S3 must be unique.
The other issue is that if you supply your bucket name, vipdac will pollute that bucket with all of it's files.
It might be a better solution to allow you to send your data from the other bucket, via a copy. This doesn't exist yet, but I'll add it to the list of feature requests.
def bucket_name folder ? "#{folder}-#{access_key}-vipdac" : "#{access_key}-vipdac" end
def folder keys["folder"] ? keys["folder"].downcase.gsub(/[^a-z0-9]+/i, '') : nil end
-
Brian Pratt April 23rd, 2009 @ 04:21 PM
Of course these can be pretty big files, so copying may be unattractive. Really, though, this would only be an issue for folks wanting to use multiple tool suites (VipDAC, TPP, ?) on the same data set for comparison purposes, how common a use case that might be it's too soon to say. It's probably the case that TPP can latch onto a VipDAC bucket without issues, so the point may be moot if one does things in the proper order...
-
jgeiger April 23rd, 2009 @ 04:25 PM
Copying between S3 buckets should be a very fast operation, much faster than uploading the file yourself from a browser.
Also, per Amazon data transferred within an Amazon S3 location via a COPY request is free of charge.
-
Brian Pratt April 23rd, 2009 @ 04:35 PM
For sure, but even if there are no bandwidth charges, doesn't that result in a physical copy of the data? Just thinking of S3 storage fees.
-
jgeiger April 23rd, 2009 @ 04:41 PM
There are storage fees, but most likely they are going to be much smaller than the usage charge for the EC2 instances. ($0.20/hr for medium node vs $0.17/GB/month)
Since datafiles are separate from jobs, I will modify the application to pull the file down from your bucket directly instead of copying it, which makes a lot more sense. You will still be charged for storing the results files, but you can delete them as needed once you've downloaded them.
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
Virtual Proteomics Data Analysis Cluster