in

How to replicate the Amazon S3 bucket using Boto3?

Amazon Simple Storage Service provides us with object storage through a web interface. It is a scalable, secure, and widely used AWS service. Where Boto is a Python package that provides interfaces to AWS services. In this tutorial, we will see how you can replicate the Amazon S3 bucket using Boto3? Let’s start!

What is Amazon S3 Replication? 

Amazon S3 Replication is a way to replicate or make copies of S3objects in the same region or in different regions. It is an elastic, low cost and fully managed feature.

You can replicate your data on Amazon S3 across distant AWS Regions or across unique accounts in the same AWS Region. This can help you in minimizing application latency by maintaining object copies in regions that are closer to your users. 

Why should you use replication?

Replication helps you in securing your data. Keeping multiple copies of your s3 objects will help you in long term. You can also set multiple destination buckets across different Regions to ensure geographic differences. Replication can be used to directly put objects into S3 Glacier, Archive, or another storage class in the destination buckets automatically. You can also manage object copies under different ownership using replication. 

What are the types of replication?

  1. Cross-region replication: You can store your s3 data in multiple regions. This will help you in compliance requirements, minimizing latency, and increasing operational efficiency. 
  2. Same region replication: When you store your object copies in the same region using replication. 

How to enable replication in AWS S3?

It is easy to replicate data using AWS console – you can just easily set up replication on the source and target bucket. But, what if you have thousands of buckets? It will not be easy to replicate those buckets manually. In this blog, we will introduce you to how you can enable replication automation for S3 using Boto3.

Here we will use a spreadsheet as input to create multiple replication rules for multiple s3 buckets. Then AWS CLI to set up the needed permissions. You can store information like this in your spreadsheet. 

  • Permission configuration

S3 must have permission to replicate objects from the source bucket to the target bucket. So first you must create an IAM role to attach a permission policy to the role. Here is the policy you can use for the role. 

{

   “Version”:”2012-10-17″,

   “Statement”:[

      {

         “Effect”:”Allow”,

         “Action”:[

            “s3:GetObjectVersionForReplication”,

            “s3:GetObjectVersionAcl”

         ],

         “Resource”:[

            “arn:aws:s3:::source-bucket/*”

         ]

      },

      {

         “Effect”:”Allow”,

         “Action”:[

            “s3:ListBucket”,

            “s3:GetReplicationConfiguration”

         ],

         “Resource”:[

            “arn:aws:s3:::source-bucket”

         ]

      },

      {

         “Effect”:”Allow”,

         “Action”:[

            “s3:ReplicateObject”,

            “s3:ReplicateDelete”,

            “s3:ReplicateTags”,

            “s3:GetObjectVersionTagging”

 

         ],

         “Resource”:”arn:aws:s3:::destination-bucket/*”

      }

   ]

}

  • Implementation

The Boto3 code reads the spreadsheet template and checks the source bucket. In case if there is no existing replication for a source bucket, it creates a target bucket and then a replication rule. 

The code will perform the following steps:

  1. Check source buckets for an existent replication 
  2. If needed add versioning to the source buckets 
  3. Create target bucket using parameters in the spreadsheet
  4. Create replication using parameters in the spreadsheet
  5. Tag buckets

Here is the complete code: 

# -*- coding: utf-8 -*-

import boto3

import botocore

import pandas as pd

import logging

import json

logging.basicConfig(filename=’example.log’,filemode=’w’,level=logging.INFO)

file = ‘CRR.xlsx’

df = pd.read_excel(file)

s3 = boto3.resource(‘s3’)

client = boto3.client(‘s3’)

for i in df.index:

    bucket = df[‘Source Bucket Name’][i]

    region = df[‘Region’][i]

    storageClass = df[‘Target Storage Class’][i]

    prefix_filter = df[‘Prefix Filter’][i]

    if not isinstance(prefix_filter,str):

        prefix_filter = ”

    deleteMarkerReplication = df[‘DeleteMarkerReplication’][i]

    existingObjectReplication = df[‘Existing Object Replication’][i]

    # GET REPLICATION

    try:

        config = client.get_bucket_replication(Bucket=bucket)

    except botocore.exceptions.ClientError:

        # Log replication config

        logging.info(f’Bucket “{bucket}” : Replication config not enabled. Will enable versioning on source bucket and create a replication config’)

        # GET VERSIONING

        # try:

        client.get_bucket_versioning(Bucket=bucket)

        # except Exception:

        logging.info(f’Bucket “{bucket}” : enabling versioning’)

        client.put_bucket_versioning(

            Bucket=bucket,

            VersioningConfiguration={

                ‘Status’: ‘Enabled’

            },

        )

        # CREATE TARGET BUCKET

        logging.info(‘Creating Target Bucket’)

        client.create_bucket(

            Bucket=f'{bucket}-target’,

            CreateBucketConfiguration={

                ‘LocationConstraint’: region,

            },

        )

        logging.info(f’Blocking Public Access for the target bucket “{bucket}-target” ‘)

        client.put_public_access_block(

            Bucket=f'{bucket}-target’,

            PublicAccessBlockConfiguration={

                ‘BlockPublicAcls’: True,

                ‘IgnorePublicAcls’: True,

                ‘BlockPublicPolicy’: True,

                ‘RestrictPublicBuckets’: True

            },

        )

        logging.info(f’Creating versioning for the target bucket “{bucket}-target” ‘)

        client.put_bucket_versioning(

            Bucket=f'{bucket}-target’,

            VersioningConfiguration={

                ‘Status’: ‘Enabled’

            },

        )

        # Checking if config was created and skipping if not needed

        logging.info(f’Inserting replication for the bucket “{bucket}” ‘)

        client.put_bucket_replication(

            Bucket=bucket,

            #Modify the entry below with your account and the role you created

            ReplicationConfiguration={

                ‘Role’: ‘arn:aws:iam::968283274184:role/replicationRole-10-17-2020’,

                ‘Rules’: [

                    {

                        ‘Priority’: 1,

                        ‘Filter’: {

                            ‘Prefix’: prefix_filter,

                    },

                        ‘Destination’: {

                            ‘Bucket’: f’arn:aws:s3:::{bucket}-target’,

                            ‘StorageClass’: storageClass,

                        },

                        ‘Status’: ‘Enabled’,

                        ‘DeleteMarkerReplication’: {‘Status’: deleteMarkerReplication},

                    },

                ],

            },

        )

        print(bucket)

    else:

        logging.info(f’Bucket “{bucket}” already had replication’)

    config=client.get_bucket_replication(Bucket=bucket)

    logging.info(f’bucket config “{config}” ‘)

 

This can help you automating s3 replication and meet business and security requirements. 

What do you think?

Written by DANN N

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Loading…

0