PyTorch 1.0 preview (Dec 6, 2018) packages with full CUDA 10 support for your Ubuntu 18.04 x86_64 systems.

(The wheel has now been updated to the latest PyTorch 1.0 preview as of December 6, 2018.)

You’ve just received a shiny new NVIDIA Turing (RTX 2070, 2080 or 2080 Ti), or maybe even a beautiful Tesla V100, and now you would like to try out mixed precision (well mostly fp16) training on those lovely tensor cores, using PyTorch on an Ubuntu 18.04 LTS x86_64 system.

tensor-core.jpg

The idea is that these tensor cores chew through fp16 much faster than they do through fp32. In practice, neural networks tolerate having large parts of themselves living in fp16, although one does have to be careful with this. Furthermore, fp16 promises to save a substantial amount of graphics memory, enabling one to train bigger models.

For full fp16 support on the Turing architecture, CUDA 10 is currently the best option. Also, a number of CUDA 10 specific improvements were made to PyTorch after the 0.4.1 release.

However, PyTorch 1.0 (first release after 0.4.1) is not quite ready yet, and neither is it easy to find CUDA 10 builds of the current PyTorch 1.0 preview / PyTorch nightly.

Oh noes…

Well, fret no more!

Here you’ll be able to find a fully CUDA 10 based build (pip wheel format) of PyTorch master as on November 10 (updated!), 2018, up to and including commit b5db6ac. I’ve linked it with a fully CUDA 10 based build of MAGMA 2.4.0 as well, which I built as a conda package.

Installing and using these packages.

Ensure that you have an Ubuntu 18.04 LTS system with CUDA 10 and CUDNN installed and configured. See this great CUDA 10 howto by Puget Systems.

After this, you will also need to download CUDNN 7.1 packages for your system from the NVIDIA Developer site. An NVIDIA developer account (free signup) is required for this. I downloaded and installed libcudnn7_7.4.1.5-1+cuda10.0_amd64.deb and libcudnn7-dev_7.4.1.5-1+cuda10.0_amd64.deb but you’ll probably only need the former.

Setup a suitable conda environment with Python 3.7. Setup and activate with something like the following:

conda create -n pt python=3.7 numpy mkl mkl-include setuptools cmake cffi typing
conda activate pt
conda install -c mingfeima mkldnn

You can now download the PyTorch nightly wheel of 2018-12-06 (347MB) and install with:

pip install torch-1.0.0a0+b5db6ac+20181206-cp37-cp37m-linux_x86_64.whl

The libraries in the wheel don’t have the conda-style relative RUNPATH correctly set, so you have to set LD_LIBRARY_PATH every time when starting your jupyter or any other Python code. This should work:

LD_LIBRARY_PATH=$CONDA_PREFIX/lib jupyter lab

You’re now good to go!

First tests of mixed precision training with fast.ai on Tesla V100.

I fired up a Google Compute Engine with Tesla V100 node in Amsterdam to check that everything works.

I used the latest version of the fastai library, and specifically the callbacks.fp16 notebook which forms part of the brilliant new fastai documentation generation system. See for example the generated page on the fp16 callbacks.

Below I show the MNIST example code where I tried to compare fp32 with fast.ai fp16 (well, mixed precision to be precise) training.

The simple CNN trains up to 97% accuracy in 8 seconds, which is pretty quick already, but I could not see any training speed difference between fp16 and fp32. This could very well be because the network is so tiny.

However, I could confirm that the model parameters (at the very least) were all stored in fp16 floats when using the fast.ai to_fp16() Learner method.

Train CNN with fp16

from fastai import *
from fastai.vision import *
path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)
model = simple_cnn((3,16,16,2))
learn = Learner(data, model, metrics=[accuracy]).to_fp16()
learn.fit_one_cycle(5)
Total time: 00:08
epoch  train_loss  valid_loss  accuracy
1      0.202592    0.139505    0.948970  (00:01)
2      0.112530    0.103523    0.967125  (00:01)
3      0.079813    0.063746    0.973994  (00:01)
4      0.066733    0.056465    0.976938  (00:01)
5      0.069775    0.055017    0.977429  (00:01)

Check that type of parameters is half:

for p in model.parameters():
    print(p.type())
torch.cuda.HalfTensor
torch.cuda.HalfTensor
torch.cuda.HalfTensor

Train CNN with fp32

model32 = simple_cnn((3,16,16,2))
learn32 = Learner(data, model32, metrics=[accuracy])
learn32.fit_one_cycle(5)
Total time: 00:08
epoch  train_loss  valid_loss  accuracy
1      0.213889    0.151780    0.942100  (00:01)
2      0.106975    0.092190    0.966634  (00:01)
3      0.084529    0.083353    0.973013  (00:01)
4      0.069017    0.066023    0.976938  (00:01)
5      0.060235    0.056738    0.980373  (00:01)

Check that type of model parameters is full float:

for p in model32.parameters():
    print(p.type())
torch.cuda.FloatTensor
torch.cuda.FloatTensor
torch.cuda.FloatTensor

9 thoughts on “PyTorch 1.0 preview (Dec 6, 2018) packages with full CUDA 10 support for your Ubuntu 18.04 x86_64 systems.”

  1. Python 3.7.0 (default, Jun 27 2018, 13:15:42)
    [GCC 7.2.0] :: Anaconda, Inc. on linux
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>> import torch
    Traceback (most recent call last):
    File “”, line 1, in
    File “/home/red/anaconda3/lib/python3.7/site-packages/torch/__init__.py”, line 84, in
    from torch._C import *
    ImportError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory
    I got this error,LD_LIBRARY_PATH=$CONDA_PREFIX/lib jupyter lab,is it the only way to fix it?,thank you!!

    1. Please do follow the instructions. You have to set LD_LIBRARY_PATH as explained above, else you see that error. Also make sure to install mkl and mkl_dnn as explained in the post.

      1. DID. YOU. READ. THE. INSTRUCTIONS. IN. THE. POST. RIGHT. ABOVE. THIS?

        (sorry about that. 🙂 but the post above explains extremely clearly that you should have mkldnn installed, and that you should set LD_LIBRARY_PATH correctly.)

        1. sorry, i was miss understood about this
          LD_LIBRARY_PATH=$CONDA_PREFIX/lib jupyter lab

          that mean like this ?
          LD_LIBRARY_PATH=/user/…/…/…/jupyter

          1. If you’re using conda with an environment active, use EXACTLY the line I posted.

            If you’re not using conda, find the path containing your libmkl*so files, and set that as LD_LIBRARY_PATH.

            1. i using the miniconda, and still stuck.
              thanks for your time

              (pt) [proofn@blackhole pt]$ conda install -c mingfeima mkldnn
              Solving environment: done

              # All requested packages already installed.

              (pt) [proofn@blackhole pt]$ cd ~/torch/
              (pt) [proofn@blackhole torch]$ python3.7 -m pip install –user torch-1.0.0a0+6e1e203+20181124-cp37-cp37m-linux_x86_64.whl
              Processing ./torch-1.0.0a0+6e1e203+20181124-cp37-cp37m-linux_x86_64.whl
              Installing collected packages: torch
              Found existing installation: torch 1.0.0a0+6e1e203
              Uninstalling torch-1.0.0a0+6e1e203:
              Successfully uninstalled torch-1.0.0a0+6e1e203
              Successfully installed torch-1.0.0a0+6e1e203
              (pt) [proofn@blackhole torch]$ LD_LIBRARY_PATH=$CONDA_PREFIX/lib jupyter lab
              Error executing Jupyter command ‘lab’: [Errno 2] No such file or directory

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.