Loading a CSV file from S3 to an Aurora RDS instance

For something that seems like it would be common, it was surprisingly hard to load a CSV file into a table on an Aurora RDS instance. Hoping this helps anyone else out there that is trying to do the same.

Create RDS IAM role

The first step is to create a new role for the RDS instance. Go to the IAM page and then the Roles tab. Create a new role, and select the Amazon RDS role

On the next screen, it's not necessary to attach any policies. We will be attach the S3 policies in the next step.

After creating the IAM role, it's time to attach the S3 policy. Go to the Roles tab, select your newly created role, and then Attach policy. The policy we need to add is the AmazonS3FullAccess policy. After attaching the policy, keep track of your ARN for the role.

Setup cluster parameter group

Modifying the cluster parameter group is something that was completely new to me. Before doing this step, when I tried the SQL command for loading the CSV file I was getting this error:

S3 API returned error: Both aurora_load_from_s3_role and aws_default_s3_role are not specified, please see documentation for more details  

So to set up the parameter group correctly, first grab the ARN value for the IAM role created in the previous step. For example arn:aws:iam::<number>:role/<role_name>

Next, go to the RDS console and click the Parameter Groups menu on the left.

It's not possible to edit the default parameter groups, so create a new parameter group. On the first screen, make sure to select "Cluster parameter group" in the drop-down

After creating the parameter group, select the parameter group and edit parameters. At the edit parameters screen, add the ARN value for the IAM role to the aurora_load_from_s3_role parameter

After saving the parameter group, it needs to be applied to the cluster. Go to the Clusters tab, select your Aurora cluster, and modify the cluster. At the modify cluster screen, change the parameter group to the newly created parameter group and check the "Apply immediately" box.

After applying the new parameter group, the RDS instance needs to be rebooted. Go to the Instances tab and reboot your RDS instance.

Add the role to your cluster

Go back to the RDS page and then the Clusters tab. Select your cluster, and then the Manage IAM Roles button at the top. Use the dialog to add the role we created in the earlier step to the cluster.

Before I did this step, I was getting this error:

S3 API returned error: Missing Credentials: Cannot instantiate S3 Client  

Load the dang data

After rebooting, you're almost set! You just need to make sure you execute the query correctly. The AWS documentation doesn't make it completely clear (at least for me), but this is the query that worked for me:

load data from s3 's3-<region>://<folder>/<filename>.csv'  
into table <tablename>  
fields terminated by ',';  

Hope this helps someone else. Please feel free to contact me with any feedback or questions.