Configuring AWS & Local Environment for MLOps Pipelines Project
Welcome to this post which will be the first in a series where I implement an end-to-end ML (Machine Learning) pipeline. To do this I will be using AWS (Amazon Web Services) and some of the tools on offer at Amazon i.e. SageMaker etc. This post aims to get you set up with an environment which you can then use to implement your own pipeline.
The code for this project can be found in my GitHub (4igeek) repository.
What are we going to be building?
We are going to be working with the penguins dataset for this project. We will be trying to predict whether a penguins species.
This is what we are going to be building:
- Training Pipeline
- Inference Pipeline
- Deployment Pipeline
- Data/Model Monitoring
Before we get into the nuts and bolts of this we need to create a new environment on our computer to work in. To do this, create a folder somewhere that you’ll be able to locate easily and then run the following lines of code in your terminal (ensuring you are in the same location in your terminal).
python -m venv my_env
my_env/Scripts/activate
\\ I called my venv “ml_env” but you can call it what you like.
If you’ve done that successfully you should see your environment name displayed just before the path that you are currently in (in your terminal).
Now, you need to visit AWS and then log in using the “root” user account. Then once you’re inside we need to visit the IAM (Identity and Access Management) area because this is where we’re going to configure the users for our pipeline. We’re going to create a new user for our project because we don’t want to access our AWS account using the “root” user from our code (or terminal etc). Once we’ve created this user, we’ll then be using it to access our AWS account.
Once you’re logged in and have found the IAM Dashboard (you can search for “IAM” in the search bar at the top and IAM will show up) we’ll then need to click on “Users” to go through to the users area.
Once there, we’re then going to click on “Create User” which will then take us through the process of creating a new user. For this example I’m going to create a user called “ml_ops” and then I’m going to check the “Provide user access to the AWS Management Console” option and then the “I want to create an AIM user” radio button.
For this example we want to get a user setup quickly so we’re not going to bother with the “Users must create a new password at next login” option. If this were in production for a client, then you would need to go down that route. For now, we just want fast and easy access to our AWS. So leaving everything else as default, we now are going to click the “next” button at the bottom of the page.
This will take us to the next page where we need to set permissions for the new user. We are going to do this by attaching the policy directly.
We then need to attach the AdministratorAccess policy to the user.
This will give admin access to the user. If this was for a client, we could consider creating a new type of user that could only access what they need to but for now, we’re just looking at getting set up quickly. Once that has been checked click the “next” button and that will take us to a summary page for the new user. As long as the steps have been followed so far, you should be good to click on the “Create User” button (which will create the new user for us).
When we arrive on the next page it is time to grab the password that was created for us (click the “Show” button to see the password and then store it somewhere safe for future use). Once all that is done then click on the “Return to user list” button which will take you back to the main “Users” page where we will see our new user.
Once we’ve done this we’re going to install the AWS CLI (Command Line Interface) onto our computer. When you click on the AWS CLI link you’ll be taken to a page where you can install the correct version for your operating system (AWS provides instructions for the various operating systems).
Once we’ve done that we then need to configure the CLI. The first thing we need to do is to create an access key so that connect to AWS from our command line. To do this, go back to your list of users (in the IAM dashboard) and click on the user you just created. Once there, click on the “Security credentials” tab that is on that page.
Once you get inside of the “Security credentials” tab you’re going to see the access keys area (something like the 4th block down from the top) and from there we are going to create a new access key.
This will take you to a page where you can specify a use case for your access. In this instance we are going to select the Command Line Interface (CLI) option (making sure you confirm that you understand the above recommendation) and then click “Next” button.
On the next page, add a short description for your access key and then click on the “Create access key” button.
Once you’ve done that you will be shown your access and secret access keys. We’re going to use these to configure our AWS CLI which will allow us to connect to our AWS account from our command line.
Run the following command in your terminal:
aws configure
You will then be prompted to set the access key, the secret access key (we just generated these a second ago), the region (check your IAM dashboard to see what region you are in) and the default output format (which you can just hit enter on because we don’t care about this).
Once you have done that we should now be able to communicate with our AWS account via our command line (using the correct user). To test if we have set everything up correctly we can run the following from the command line which will create a new bucket for us (make sure you replace [YOUR-BUCKET-NAME] with your desired bucket name, and that you have set the correct region for your project).
aws s3api create-bucket --bucket [YOUR-BUCKET-NAME] \
--create-bucket-configuration LocationConstraint="eu-west-1"
\\ Bucket names need to be unique!
Once you have run that, providing you have set everything up correctly, that line should go off into AWS and create an s3 bucket for you. To ensure this has worked we can test that the bucket has been created by running the following command.
aws s3 ls
If everything worked successfully then you’ll see a list of all your s3 buckets that you’ve ever created.
Now what we want to do is to add the data (that we are going to be using in this project) to our bucket. We are going to be using the penguins’ dataset (which is available from kaggle for free click here to download). When you download the dataset, store it in the same location where we set up our venv (see the start of this post) and then inside your terminal navigate to that same folder and then run the following command (ensuring you set the correct name for your bucket):
aws s3 cp penguins.csv s3://[BUCKET-NAME]/penguins/data/data.csv
Once you have run that you should see the following message in your terminal
To confirm that has happened on AWS side of things you can run the following:
aws s3api list-objects --bucket ml-ops-bucket
You should then see the data that is stored in that bucket.
We can also visit the s3 area in AWS and navigate to that bucket and see the file that we just uploaded (but I’m not going to do that in this post).
Now we are going to need to set this bucket as a variable inside of an environment file. To do this, inside the project folder where you downloaded the dataset (and configured a venv), create a new file called ‘.env’ and then add the following to that file:
BUCKET=[YOUR BUCKET NAME]
The last step in this phase is to gain access to the ml.m5.xlarge instance to run some of the processes that we will implement throughout this series of posts. By default, most accounts aren’t going to be given access to that instance automatically so we will need to request access. To do this we need to visit Service Quotas > AWS Services > Amazon SageMaker.
Once there, search for ml.m5.xlarge and locate “ml.m5.xlarge for endpoint usage” and select it. You’ll then see a yellow button appear giving you the ability to “request increase at account level” (or words to that effect).
I’ve already got 4 as my quota so I’m good to go (we need a minimum of 3 for this project). It may take up to 24 hours for these changes to take effect.
Configuring SageMaker
Once we’ve got to this point we are going to need to configure SageMaker. The first thing that we need to do is to grab our 12-digit account number (as we will need this to login as our IAM user along with the username and password that we generated earlier). Then once you’ve done that, log out of AWS and then log in again using the IAM user credentials that we just generated.
Once you’re logged in you will be taken to the AWS console home where you can then use the search feature to search for “SageMaker”. When you click on SageMaker you will then be taken through to the SageMaker dashboard.
The next thing we are going to do is set up a SageMaker domain so once you’re in the SageMaker dashboard, look for an option (in the menu on the left-hand side) called “domains”.
When you get there, click on the yellow “Create domain” button and then follow the instructions for “quick set up”. Once you have finished, AWS will go off and create a domain for you to use.
\\ Note: it can take a little while (10-15 mins or so) to setup a domain.
The next thing we’re going to need to do is get our execution role which is what SageMaker is going to use to run every SageMaker component in our pipeline. To find this information (assuming your domain has finished being set up) click on the domain and then look for a box called “Authentication and permissions” and in that box you’ll see a value called “default execution role” which is what we are looking for.
Once you’ve found the role, we then need to add it to our .env file as we did before with the bucket. So open your .anv file and add the following to it:
ROLE=[YOUR EXECUTION ROLE]
Once we’ve found the role, we’re then going to need to change the permissions for that role because we are going to need to access a number of different functions from within AWS i.e. s3 and SageMaker etc.
We do this by going into our IAM console and look for the “Roles” option in the left-hand menu on that page. Once there, look for that role out of the list of roles available (feel free to search for it using the search feature). Once you have located it, click on it and then locate the “Permissions policies” box and then click on the execution policy that is in that list (that has the same ID as the default execution role you just created when creating the domain).
When you get there, look for the “Permissions defined in this policy” box and then click on “Edit” to edit the permissions.
The next step, once you’ve arrived in the JSON editor for that policy, is to replace ALL the text in there (the JSON object) with the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "IAM0",
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"iam:AWSServiceName": [
"autoscaling.amazonaws.com",
"ec2scheduled.amazonaws.com",
"elasticloadbalancing.amazonaws.com",
"spot.amazonaws.com",
"spotfleet.amazonaws.com",
"transitgateway.amazonaws.com"
]
}
}
},
{
"Sid": "IAM1",
"Effect": "Allow",
"Action": [
"iam:CreateRole",
"iam:DeleteRole",
"iam:PassRole",
"iam:AttachRolePolicy",
"iam:DetachRolePolicy",
"iam:CreatePolicy"
],
"Resource": "*"
},
{
"Sid": "Lambda",
"Effect": "Allow",
"Action": [
"lambda:CreateFunction",
"lambda:DeleteFunction",
"lambda:InvokeFunctionUrl",
"lambda:InvokeFunction",
"lambda:UpdateFunctionCode",
"lambda:InvokeAsync",
"lambda:AddPermission",
"lambda:RemovePermission"
],
"Resource": "*"
},
{
"Sid": "SageMaker",
"Effect": "Allow",
"Action": [
"sagemaker:UpdateDomain",
"sagemaker:UpdateUserProfile"
],
"Resource": "*"
},
{
"Sid": "CloudWatch",
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData",
"cloudwatch:GetMetricData",
"cloudwatch:DescribeAlarmsForMetric",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:CreateLogGroup",
"logs:DescribeLogStreams"
],
"Resource": "*"
},
{
"Sid": "ECR",
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
},
{
"Sid": "S3",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::*"
},
{
"Sid": "EventBridge",
"Effect": "Allow",
"Action": [
"events:PutRule",
"events:PutTargets"
],
"Resource": "*"
}
]
}
Once you’ve done that then click “Next” to save the changes. You’ll now have access to all of the services that we are going to be using in this series.
Now we are going to need to go back to the “Roles” tab and look for our execution policy role again. Once you’ve found it, click on it to go into the role and when you’re there, look for a tab called “Trust relationships”
And then click on “Edit trust policy” replacing the text in there with the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"sagemaker.amazonaws.com",
"events.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
Once you’ve done that you’ve now allowed your execution role to work with other AWS features. That is our AWS account all set up.
Setting up Comet ML
The last thing that we need to do is add some more information to our .env file. We are going to be using a cloud-based machine learning platform that allows us to automatically track and version experiments called Comet ML. Once you have created an account use the Comet ML Quickstart Guide to find where you can get your API keys for this project and then add those to the .env file that we have been using thus far using the following:
COMET_API_KEY=[YOUR COMET API KEY]
COMET_PROJECT_NAME=[YOUR COMET PROJECT NAME]
Your .env file should now look like this:
COMET_API_KEY=[YOUR COMET API KEY]
COMET_PROJECT_NAME=[YOUR COMET PROJECT NAME]
BUCKET=[YOUR BUCKET NAME]
ROLE=[YOUR EXECUTION ROLE]
This is the last step in getting our local environment + AWS configured. If you would like to access the code for this project, please visit my github repository (ml-ops-pipeline) where you can clone the project and run it locally on your computer.