Some CUDA AWS/Ubuntu Notes
unix-tools
tensorflow
wwe
]
Been cleaning up my work email and found these notes on installing CUDA on an EC2 instance from back in April. Figured some of it could potentially come in handy one day.
These commands should work on Ubuntu, but no promises for a Mac.
Verify machine has CUDA-capable GPU
lspci | grep -i nvidia
00:03.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K520] (rev a1)
Check!
Verify that we have supported version of Linux (should see x86_64 for 64-bit)
uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.2 LTS"
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Check!
Verify system has gcc installed
gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Check!
System must have development packages consistent w/ the kernel headers
uname -r # show kernel version on system
4.4.0-75-generic
Latest kernel headers and development packages
# Maybe this will help
sudo apt-get install linux-headers-$(uname -r)
Environment Variables
I could not import TensorFlow into Python without error:
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
This basically says that Python/TensorFlow could not detect any the relevant GPU software. Fortunately, googling an error message gets one on the right path pretty quickly. In particular, this page indicated that we might be missing a necessary environment variable.
I put the following line into the .bashrc start-up file so that this environment variable is available to the OS every time we log in:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
This fixed the issue.
More about our installation of CUDA
GOAL: check to see if CUDA samples properly installed and if GPU is found
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make # compiles sample code w/in directory (sudo gives superuser permissions)
# -- Run deviceQuery (should see info and Result=PASS)
./deviceQuery # below is my output (OUTCOME: Cuda Samples installed and GPU found)
More checks on CUDA installation: http://xcat-docs.readthedocs.io/en/stable/advanced/gpu/nvidia/verify_cuda_install.html
cd /usr/local/cuda/samples/1_Utilities/bandwidthTest
sudo make
./bandwidthTest # OUTPUT
Some Resources
- CUDA GPU Info
- http://expressionflow.com/2016/10/09/installing-tensorflow-on-an-aws-ec2-p2-gpu-instance/
- https://github.com/fluxcapacitor/pipeline/wiki/AWS-GPU-Tensorflow-Docker
- http://xcat-docs.readthedocs.io/en/stable/advanced/gpu/nvidia/verify_cuda_install.html
- https://livingthing.danmackinlay.name/how_is_amazon_cloud_number_crunching_awful.html