August 9, 2016

Fedora 24 + Bumblebee + CUDA + Theano

It’s very frustrating when you want to try something out quickly and lose your entire day… But at least next time it will be easier. I hope.

This is a guide on how to run Keras (Theano powered neural network Python library) on CUDA with an NVIDIA Optimus laptop, with Fedora 24 and Bumblebee.

Bumblebee working #

Get Bumblebee working, that is the easy part. Use the nonfree drivers, there are very good instructions here:

https://fedoraproject.org/wiki/Bumblebee

Afterwards verify that Bumblebee is working with a quick:

optirun glxgears

Install CUDA #

Get CUDA from here:

https://developer.nvidia.com/cuda-downloads

The version for Fedora 21 is the latest one, and it will work. Unpack it after downloading it with:

sh cuda_7.5.18_linux.run -extract=/path/to/somewhere

You will get three files, the driver (which you don’t need), CUDA toolkit and some samples. Install the toolkit and samples with:

cd /path/to/somewhere/ # this is where you extracted the .run file
sudo ./cuda-linux64-rel-7.5.18-19867135.run
sudo ./cuda-samples-linux-7.5.18-19867135.run

When prompted, tell the installer to install everything to /opt/cuda and /opt/cuda/samples for the samples, and update your ~/.bashrc with these two lines:

export PATH=$PATH:/opt/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda/lib64

and source it: source ~/.bashrc.

Verify that CUDA works:

cd /opt/cuda/samples/1_Utilities/deviceQuery
sudo make
optirun ./deviceQuery

GCC is too new #

CUDA only works with GCC versions 4.9 or lower and Fedora 24 comes with GCC 6.1. Happily we can install 4.9.2 from the CentOS repository. I found the answer here on how to do this.

Download the linked file from there and do the following:

tar xf CentOS-SCLo-scl-el7.tar.gz
sudo cp ./etc/* /etc -rf
sudo dnf install devtoolset-3-gcc-c++

After that if you run scl enable devtoolset-3 bash your path will be updated and you will be using GCC 4.9.2 which you can verify with a quick gcc --version. GCC 4.9.2 will only be used for the duration of the terminal session.

Installing Keras #

Install Keras as instructed here:

http://keras.io

If you don’t know, in Fedora, you can install stuff like numpy or sklearn very easily with dnf, so you don’t have to compile them from source with pip. Just try dnf search numpy and you will find versions both for Python 2 and 3.

After installation, verify that everything is fine with:

export THEANO_FLAGS='cuda.root=/opt/cuda,device=gpu,floatX=float32'
optirun python3 -c "import theano; print(theano.sandbox.cuda.device_properties(0))"

If everything is good, you should get something like this:

Using gpu device 0: GeForce GTX 770M (CNMeM is disabled, CuDNN not available)
{'warpSize': 32, 'ECCEnabled': 0, 'maxGridSize0': 2147483647, 'canMapHostMemory': 1, 'concurrentKernels': 1, 'major': 3, 'name': 'GeForce GTX 770M', 'coresCount': -5, 'maxThreadsPerBlock': 1024, 'deviceOverlap': 1, 'minor': 0, 'memPitch': 2147483647, 'sharedMemPerBlock': 49152, 'runtimeVersion': 7050, 'maxGridSize1': 65535, 'maxGridSize2': 65535, 'textureAlignment': 512, 'maxThreadsDim1': 1024, 'clockRate': 797000, 'maxThreadsDim0': 1024, 'tccDriver': 0, 'regsPerBlock': 65536, 'computeMode': 0, 'integrated': 0, 'kernelExecTimeoutEnabled': 0, 'maxThreadsDim2': 64, 'driverVersion': 8000, 'multiProcessorCount': 5, 'totalConstMem': 65536}

Worth it? #

Very! While training the network with the MNIST data set, one epoch took about a minute on my Intel Core i7-4700MQ. On my GPU, the same code executed in about 1.1 second.

CUDA is amazing. :) It’s a bit difficult to set it up, but it is something you only have to do once, and it will save you lot of time in the future. :)

Good luck!

Kudos