It is a well known fact that machine learning (and #deep-learning in particular) needs lots of compute power. If conditions are right (i.e. enough data is involved) a GPU enabled machine might be needed to speed up training (and thus have a faster iteration cycle).
Having a GPU though is quite expensive nowadays so most researchers use one through cloud offerings (like AWS, Azure or Google Cloud). In general, this is cheaper than buying and setting up your own dedicated machine. But in order to actually be cheap you need to employ the following usage pattern:
* do the ETL, on a CPU (cheap) machine
* do the model development on the CPU (cheap) machine
* optimise and debug the model on the CPU (cheap) machine
* train the model on the GPU (expensive) machine
* analyse the results on the CPU (cheap) machine
The above steps basically tell you that you should do the memory intensive and conceptual intensive tasks on the CPU machine while you want to do the compute intensive tasks on the GPU.
Google #colab is famous for the ease with which you can adopt this pattern because changing the instance type that powers your #jupyter environment:
On AWS though this is much more involved. You need to login to the console portal, navigate to the EC2 section, search for the instance, stop it (if it is already running), change its instance type, starting the machine back up. Even though this is more explicit and allows for better control of what you are doing, it quickly becomes cumbersome to do and you might resign at some point to keep using the GPU just to avoid going through these steps too often.
An alternative (of course) is to automate these, in the following fashion:
* download the change_ec2_instance_type scripts found on this official GitHub repo
* there are three scrips there:
awsdocs_general.sh
change_ec2_instance_type.sh
test_change_ec2_instance_type.sh
* put them on the same directory (for example `~/Applications/aws-cli-tools`)
* add the following snipped into your `~/.bash_aliases` file
source ~/Applications/aws-cli-tools/awsdocs_general.sh
source ~/Applications/aws-cli-tools/change_ec2_instance_type.sh
function aws_switch_type() {
PROFILE=${1}
MACHINE=${2}
INSTANCE_TYPE=${3}
echo "[+] Will change machine ${MACHINE} on the profile ${PROFILE} to instance type ${INSTANCE_TYPE}"
INSTANCE_ID=`(export AWS_PROFILE=${PROFILE}; aws ec2 describe-instances --filters "Name=tag:Name,Values=${MACHINE}" --query "Reservations[0].Instances[0].InstanceId")`
echo "[+] Got instance id ${INSTANCE_ID} for machine ${MACHINE}"
echo "[+] Changing to instance type ${INSTANCE_TYPE} and restarting machine."
(export AWS_PROFILE=${PROFILE}; change_ec2_instance_type -i ${INSTANCE_ID} -t ${INSTANCE_TYPE} -f -r -v)
sleep 15
# ssh into the machine
# setup the tmux panes on the remote
# start all the needed apps
# jupyter
# htop
# git pull
}
alias aws_research_m4='aws_switch_type xetten.com research m4.large'
alias aws_research_p3='aws_switch_type xetten.com research p3.2xlarge'
What the above two aliases do is, using the #aws context (profile) named `xetten.com` it changes the instance type of the machine `research` to either `m4.large` or `p3.2xlarge`.
You can set up the profiles in `~/.aws/config` and `~/.aws/credentials` files in order to set up one or more profiles. This of course allows you to have multiple setups.
Also note that you can further customise what happens after the instance type of the machine is changed and it is running. I've left some ideas in the comments.
After everything is setup correctly you can issue in the terminal and work much like in #colab:
aws_research_m4
(do work)
aws_research_p3
(train)
aws_research_m4
(debug)
Comments