Backing of data to Amazon S3 or EBS
Backing of data to Amazon S3 or EBS
Amazon S3 is very reliable data store which can be accessed in many different ways. Its pay per use model makes it very attractive for data backups. One can use 's3fs' based fuse filesystems to mount an Amazon s3 bucket on local folder and then use rsync for backup. At the time of this writing the latest version of s3fs supported on Cent-OS did not preserve time stamps while copying files. This prohibits use of s3fs from Cent-OS to backup data over s3 using rsync, as without time-stamps all files will get copied each time. Also permissions, ownership etc. information also gets lost.
The better way in this case is to use EBS attached to an EC2 instance for rsync based backup. Since EBS is not guaranteed to be reliable, snapshots of EBS should be taken on S3 for important data. However, if Amazon EBS data is itself backup and you have local copy, then again backup of EBS backup on S3 may be overkill as it is unlikely that both local data, local backups and amazon EBS backup would be lost at same time. Having snapshots however would allow very fast restoration of EBS and would not require again restoring data over Internet. Choose appropriately based on data size, Internet cost, data requirement etc.
Creation of EBS and attaching it to VM can be done very easily using Amazon EC2 console so it is not explained here. After a block store is attached it can be partitioned with fdisk and formattted with mkfs.ext<n>, etc. Then it can be mounted and if required remounted using encfs.
Still use of S3 buckets provides a very good option of downloading files using S3 console whenever required without needing an EC2 instance. S3 console provides a full-fledged file manager for creating directories, uploading files, deleting items, etc. It also allows hosting file publicly for everyone to download. S3 HTTP based API are also easy to program. Hence how to use s3fs for mounting S3 bucket is explained below
Information on s3fs is available on its official wiki at http://code.google.com/p/s3fs/wiki/FuseOverAmazon
Using s3fs
Formating s3 bucket
To format s3 bucket use:
export AWS_ACCESS_KEY_ID=<put_access_key_here> export AWS_SECRET_ACCESS_KEY=<put_secret_key_here> export HISTFILE=/dev/null s3fs -C -f <bucket_name>
Mounting s3 bucket
To mount s3 bucket use:
export AWS_ACCESS_KEY_ID=<put_access_key_here> export AWS_SECRET_ACCESS_KEY=<put_secret_key_here> export HISTFILE=/dev/null s3fs -o bucket=<bucket_name> <mount_point>
Note:
- Remember to umount bucket when done as s3fs is not designed for parallel mounting from many different locations. So user must ensure only one machine is using s3fs at a time.
- Amazon access and secret keys allow access to large number of resources owned by user. Hence do not leave them in normal text files, programs, configuration files, etc. in insecure locations.
Testing connection to s3 using python
In case s3fs is not working, one can test connection to amazon S3 using python in following manner:
>>> import boto >>> s3 = boto.connect_s3('<access_key>', '<secret_key>') >>> bucket = s3.lookup('<bucket_name>') >>> key = bucket.new_key('testkey') >>> key.set_contents_from_string('This is a test')
Then try to access amazon S3 using web console and verify whether given bucket has file named 'testkey' with value 'This is a test' or not. Test learned from http://stackoverflow.com/questions/10854095/boto-exception-s3responseerror-s3responseerror-403-forbidden