Improving fastai’s mixed precision support with NVIDIA’s Automatic Mixed Precision.

TL;DR: For best results with mixed precision training, use NVIDIA’s Automatic Mixed Precision together with fastai, and remember to set any epsilons, for example in the optimizer, correctly.

Background

Newer NVIDIA GPUs such as the consumer RTX range, the Tesla V100 and others have hardware support for half-precision / fp16 tensors.

This is interesting, because many deep neural networks still function perfectly if you store most of their parameters using the far more compact 16-bit floating point precision. The newer hardware (sometimes called TensorCores) is able to accelerate further these half precision operations.

In other words, with one of the newer cards, you’ll be able to fit a significantly larger neural network into the usually quite limited GPU memory (with CNNs, I can work with networks that are 80% larger), and you’ll be able to train that network substantially faster.

fastai has built-in support for mixed-precision training, but NVIDIA’s AMP has better support due to its support of dynamic, instead of static, loss scaling.

In the rest of this blog post, I briefly explain the two steps you need to take to get all of this working.

Step 1: Set epsilon so it doesn’t disappear under fp16

I’m mentioning this first so you don’t miss it.

Even after adding AMP to your configuration, you might still see NaNs during network training.

If you’re lucky, you will run into this post on the PyTorch forums.

In short, the torch.optim.Adam optimizer, and probably a number of other optimizers in PyTorch, take an epsilon argument which is added to possibly small denominators to avoid dividing by zero.

The default value of epsilon is 1e-8. Whoops!

Under fp16 encoding, 1e-8 becomes 0, and so it won’t really help to fix your possibly zero denominators.

The fix is simple, supply a larger epsilon.

Because I’m using fastai’s Learner directly, and this takes a callable for the optimization function, I created a partial:

# create fp16-safe AdamW
# see: https://discuss.pytorch.org/t/adam-half-precision-nans/1765/4
# default 1e-8 rounded to 0
# down to 1e-7 can still be handled
# this eps is used to prevent divide by zero errors
from functools import partial
AdamW16 = partial(torch.optim.Adam, betas=(0.9,0.99), eps=1e-4)

# then stick model + databunch into new Learner
learner = fai.basic_train.Learner(data, model, loss_func=ml_sm_loss, metrics=metrics, opt_func=AdamW16)

Step 2: Setup NVIDIA’s Automatic Mixed Precision

fastai’s built-in support for mixed precision training certainly works in many cases. However, it uses a configurable static loss scaling parameter (default 512.0), which in some cases won’t get you as far as dynamic loss scaling.

With dynamic loss scaling, the scaling factor is continuously adapted to squeeze the most out of the available precision.

(You could read sgugger’s excellent summary of mixed precision training on the fastai forums.)

I was trying to fit a squeeze and excitation ResNeXt-50 32×4 with image size 400×400 and batch size 24 into the 8GB RAM of the humble but hard-working RTX2070 in my desktop, so I needed all of the dynamic scaling help I could get.

After having applied the epsilon fix mentioned above, you will then install NVIDIA Apex, and finally make three changes to your and fastai’s code.

Install NVIDIA Apex

Download and install NVIDIA Apex into the Python environment you’re using for your fastai experiment.

conda activate your_fastai_env
cd ~
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

If apex does not build, you can also try without --cude_ext --cpp_ext, although it’s best if you can get the extensions built.

Modify your training script

At the top if your training script, before any other imports (especially anything to do with PyTorch), add the following:

from apex import amp
amp_handle = amp.init(enabled=True)

This will initialise apex, enabling it to hook into a number of PyTorch calls.

Modify fastai’s training loop

You will have to modify fastai’s basic_train.py, which you should be able to find in your_env_dir/lib/python3.7/site-packages/fastai/. Check and double-check that you have the right file.

At the top of this file, before any other imports, add the following:

from apex.amp import amp
# retrieve initialised AMP handle
amp_handle = amp._DECORATOR_HANDLE

Then, edit the loss_batch function according to the following instructions and code-snippet. You will only add two new code lines which will replace the loss.backward() that you will be commenting out.

if opt is not None:
    loss = cb_handler.on_backward_begin(loss)

    # The following lines REPLACE the commented-out "loss.backward()"
    # opt is an OptimWrapper -- unwrap to get real optimizer
    with amp_handle.scale_loss(loss, opt.opt) as scaled_loss:
        scaled_loss.backward()

    # loss.backward()

All of this is merely following NVIDIA AMP’s usage instructions, which I most recently tested on fastai v1.0.42, latest at the time of this writing.

Results

If everything goes according to plan, you should be able to obtain the following well-known graph with a much larger network that you otherwise would have been able to.

The below example learning-rate finder plot was done with the se-resnext50-32x4d, image size 400×400, batch size 24 on my RTX 2070 as mentioned above. The procedure documented in this post works equally well on high end units such as the V100.

lr_finder_v20181127-se_resnext50_32x4d-320-48-400-24-fp16-v5-ml_sm-aug_wd1e-5_frozen_2019-02-04_09-39-29.png

Solving the Ubuntu 14.04 – NVIDIA 346 – nvidia-prime black screen issue

For a project that I’m currently helping with, we needed recent OpenGL features that are only available on NVIDIA drivers with version 340 and later.

Unfortunately, I have one of those NVIDIA Optimus laptops. Up to now, Bumblebee worked a treat (I would recommend this system in most cases), but for this project I needed the whole of X to run on the NVIDIA, so I had to make use of nvidia-prime to switch between Intel and NVIDIA mode.

After upgrading to the nvidia-346* packages from the xorg-edgers PPA, switching to nvidia mode by typing prime-select nvidia and then logging out and in to X, I was greeted by a black screen.

Analysis

Many hours of experimentation, script tracing and web searching later I made the following observations:

  • gpu-manager, part of ubuntu-drivers-common (in my case version 1:0.2.91.7), runs every time you start and stop your display manager (in other words, when you log out and back in) and then rewrites the /etc/X11/xorg.conf based on what it finds in the system.
  • In theory, with prime support in the NVIDIA drivers, xrandr is used to connect the output of NVIDIA adapter to the Intel adapter, which then displays the output. See the NVIDIA driver documentation for more details. The 90-nvidia.conf script in /usr/share/lightdm/lightdm.conf.d/ (part of the nvidia-prime package) calls /sbin/prime-offload, which will automatically take care of the xrandr setup for you.
  • gpu-manager was rewriting my xorg.conf file incorrectly, at least according to NVIDIA’s xrandr documentation. The primary issue was that gpu-manager was using the intel driver for the intel, instead of the modesetting driver.

The solution

All of this lead to the following (working: now tested on two setups) solution:

  • Switch to console (Ctrl-Alt-F1) and stop lightdm: sudo service lightdm stop
  • Disable gpu-manager by commenting out everything in /etc/init/gpu-manager.conf
  • Switch to nvidia mode by doing sudo prime-select nvidia
  • Change your /etc/X11/xorg.conf to look like this, making sure that the nvidia BusId is correct (check with lspci):
    Section "ServerLayout"
     Identifier "layout"
     Screen 0 "nvidia"
     Inactive "intel"
    EndSection
    
    Section "Device"
     Identifier "intel"
     Driver "modesetting"
    EndSection
    
    Section "Screen"
     Identifier "intel"
     Device "intel"
    EndSection
    
    Section "Device"
     Identifier "nvidia"
     Driver "nvidia"
     BusID "PCI:1:0:0"
    EndSection
    
    Section "Screen"
     Identifier "nvidia"
     Device "nvidia"
     Option "UseDisplayDevice" "None"
    EndSection
    
  • In the comments, Christopher May-Townsend made this brilliant suggestion. By doing sudo chattr +i /etc/X11/xorg.conf you can prevent any process from changing the file. We highly recommend that you do this, as users have reported that even after disabling the gpu-manager upstart job, it still manages to change the xorg.conf during reboot.
  • Start X up again with sudo lightdm start

If you are still greeted by a black screen, switch back to the console, and double-check that the xorg.conf has not again been rewritten to its pre-modesetting state. (if you’ve used the chattr trick above, you should be fine)

If you want to switch back to Intel you will have to stop lightdm, re-enable gpu-manager, make xorg.conf editable again with sudo chattr -i /etc/X11/xorg.conf, activate intel mode with sudo prime-select intel and then restart X with sudo service lightdm start.

It’s very possible that later versions of gpu-manager might have fixed this behaviour.

Let me know in the comments if this worked for you!

Review of Ubuntu Linux 12.04 on the Samsung NP300V3A Core i5 NVIDIA Optimus laptop

An important warning: During installation, do NOT activate home folder encryption. Due to bugs 957843 and 509180, you will most probably suffer data loss, and you won’t even know about it until it’s too late. This happened on two of my laptops during normal use, both of which I have since completely reinstalled with LUKS whole disk encryption. It’s a shame that this bug has been known for years, but that Ubuntu still ships with this as its default home folder encryption configuration.

The Review

With the release of Ubuntu 12.04 Precise Pangolin on April 26, 2012, I decided that it was finally time to test this on my almost-a-year-old Samsung NP300V3A laptop. I had been procrastinating up to now, due to all the horror stories about the lack of Linux support for the NVIDIA Optimus graphics, a hardware-software combination that auto-switches in this case between the discrete NVIDIA GeForce GT520m and the CPU-integrated Intel HD3000.

I was quite pleasantly surprised. Read on if you’re curious as to why.

The obligatory Ubuntu 12.04 Unity desktop screenshot. My gnome-terminal is using the lovely Solarized colours. Extra indicators include Dropbox, and indicator-multiload for showing the CPU, network, load and disk activity gaphs.

Installation

With the Linux Startup Disk Utility (actually called the usb-creator-gtk) on my Ubuntu desktop I installed the 12.04 x86_64 image on an old 1GB USB flash drive. A point of criticism is that the final “installing bootloader” part takes some minutes, without much feedback other than a progress bar bouncing horizontally. Booting the live disk went perfectly, and I could test basic functionality. Joining my TP-LINK TL-WR1043ND access point went without a hitch. Even suspend and resume worked out of the box. Resuming is fast, almost MacBook speed! During the installation, I used the partition tool to resize an existing NTFS partition to create space for the Linux installation. It still amazes me how smooth this process has become. From start to final boot, the whole installation took 18 minutes.

NVIDIA Optimus Support

After bootup, the first two issues I ran into were the miserable (estimated) battery life, and the fact that Super-W did not activate Window-Scale, as I was used to on other Ubuntu installations. A “ps uaxw | grep -i unity” revealed that I was running unity-2d, and sniffing through /var/log/Xorg.0.log yielded the tell-tale “(EE) Failed to initialize GLX extension (Compatible NVIDIA X driver not found)” (also that X was getting confused with the seeming presence of both Intel and NVIDIA graphics). It was clear that Ubuntu 12.04 doesn’t support Optimus out of the box.

On AskUbuntu I found this fabulous answer by one of the developers of the new Bumblebee. In short:

sudo add-apt-repository ppa:ubuntu-x-swat/x-updates
sudo add-apt-repository ppa:bumblebee/stable
sudo apt-get update
sudo apt-get install bumblebee bumblebee-nvidia
sudo usermod -a -G bumblebee $USER

After this log out and log back in, and you’re in Optimus heaven! My battery estimate was soon 3.5h+ on 80% charge (it was just under 2h at 80% before installing bumblebee), unity 3D was running, and I could start applications, using the optirun prefix, running on the NVIDIA graphics. With glxspheres, I get 1.9 frames/sec and 1.9 Mpixels/sec without and 115 frames/sec and 113 Mpixels/sec with NVIDIA graphics. Importantly, bumblebee automatically switches off the NVIDIA graphics when nothing is using it, resulting in the much longer battery life. All hail the four main developers of Bumblebee: Thulinma, Lekensteyn, Samsagax and ArchangeGabriel.

Unity

Unity, Ubuntu’s unique GUI, has improved muchly since 11.10. I gave Unity on 11.04 a serious go, and also on 11.10, but I gave up in each instance after a week or two due to glaring bugs. The 12.04 Unity has made great progress in fixing a number of small but irritating bugs, I think it might be a keeper. The heads-up display (HUD) is indeed awesome: Press “alt” (the default keybinding) and then type away to search through the menus of the currently foreground application. I’ve come to appreciate the screen space savings due to the global menubar, although it doesn’t work for all apps yet, vim-gnome being an example of note. At this moment, my only wish would be to have a window-overview like you get in the gnome3-shell when you press the super key.

There has been much bitching and moaning about the direction Ubuntu has taken with Unity, some of it valid arguments. Especially the fact that much effort is being diverted from the gnome-shell is concerning. However, although I’ve dirtied many a word using previous versions of Unity, I think it’s good that it’s exploring directions that create a new UI experience that represents a counter-pole to the Windows and OS-X approaches.

Fixing Chrome icon grouping in Unity Launcher

At the time of this update (2012-05-04) I did run into one old annoyance again. If  you start up Chrome (or Chromium) and then one of its application shortcuts, for example GMail, it groups both under the same icon on the Unity Launcher:

Chrome and Chrome Application shortcuts are grouped together under the first launcher icon, whichever that is.

If you start up the application shortcut first, for example GMail, subsequent Chrome windows will be grouped under the GMail icon. Durn.

Fortunately, the devs have been working on this bug, and the fix should soon appear as a stable release update (SRU). Until that time, you can download and install the bamfdaemon, libbamf and libbamf3-0 deb packages from here. Anything with version 0.2.116 and newer has the fix. Note that this only fixes it for the case where you’ve started up Chrome first (scenario 1 above), and not an application shortcut. See my comment on the bug report.

Multi-monitor support

I had low(ish) expectations when I connected my 40″ Sony Bravia TV to the HDMI port of the laptop, so I was more or less speechless for a while when, without me having to touch any part of the interface, Ubuntu simply extended its desktop onto the TV panel. BOOM. Just like that.

What I also like very much, is that Ubuntu by default puts the Launcher and its main menu bar on both displays (this is configurable though) and, even more gratifying, that the Dash appears automatically on the display currently containing the mouse cursor when I press the Super key. In the photo below, you can see the laptop below, on battery, outputting to the Sony TV via HDMI, and glxspheres humming along at just over 90 FPS using the discrete NVIDIA graphics. What you don’t see is me, smiling maniacally behind the camera phone.

Ubuntu 12.04 multi-monitor support FTW!

The Displays configuration window seems to think the 40″ panel is 72″, but the resolution has been correctly deduced.

Miscellaneous hardware support

Power saving looks pretty good. With the brightness set to 40% (brightness setting is not persisent unfortunately), my power usage at idle is just under 9W:

PowerTop says my idle power consumpting at 40% brightness is under 9W.

Actually with normal browsing over wlan, I was not able to push it that far over 10W. This is after having toggled 10 or so powertop tunables from “bad” to “good”. After having installed laptop-mode-tools, the tunables are all automatically and persistently “good”, except for a VM timeout. However, this seems to be a misunderstanding between laptop-mode-tools and powertop, and it is in fact quite OK.

The hardware config panel key (Fn-F1) does nothing, the touchpad disable key (Fn-F5) just works, the volume keys (Fn-F6 to Fn-F8) just work, but the hardware fan (Fn-F11) and wireless (Fn-F12) keys do nothing.

Initially the brightness keys (Fn-F2 and Fn-F3) didn’t really work, only allowing me to switch between two brightness levels (100% and 90%). Adding “acpi_osi=Linux acpi_backlight=vendor” to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, running sudo update-grub and then rebooting gives you 100% working hardware brightness control. Based on the information on this page, this configures brightness setting to happen through vendor-specific driver modules instead of through the ACPI default driver. Also see my askubuntu answer regarding this issue. Things have unfortunately changed to and fro with subsequent Ubuntu kernel releases, this page is up to date with linux kernel 3.2.0-31.

In the working cases, you get the gorgeous notifier display (and in the case of volume even a mac-like audio feedback as you change levels):

Pretty notifications with Unity

As mentioned before, suspend to and resume from RAM works like a charm, out of the box, and the resume is really fast. Hibernate does NOT work. I tested this with “sudo pm-hibernate”, but when I switched the laptop back on, it acted like it was being cold-started.

I tested the webcam and sound setup (speakers and built-in microphone) with the gmail talk plugin and with the cheese application. These both work fine. However, with Skype 2.2.0.35 for Linux, you get the dreaded too-dark webcam image. The often-posted solution of using luvcview to adjust brightness does NOT work. Here’s a better solution: Install v4l2ucp, the video4linux2 universal control panel. Keep this running when you start Skype. If the video is dark, switch the “Exposure, Auto Priority” off and then back on again. This solves the problem on my setup (built-in WebCam SCB-1100N). Whenever you startup Skype’s video capturing again, it manages to screw up the setting, so you have to retoggle it with v4l2ucp unfortunately.

The touchpad can be easily configured for two-finger scrolling, but not for three-finger gestures like it can be on Windows.

The touchpad configuration dialogue.

USB tethering with my Android-powered (CyanogenMod 7.1) HTC Desire Z works like a charm. I connect the USB cable, activate USB tethering on the telephone, and my laptop is online. This definitely qualifies as a Just Works(tm), and it seems to connect a whole lot faster than Windows 7 does.

Conclusions

When I bought this laptop, I had resigned myself to not being able to use it for Linux, for the largest part due to NVIDIA Optimus. However, due to the efforts of the Bumblebee people, and also due Ubuntu 12.04 as a whole with the multi-monitor support being a highlight, my verdict is that this laptop is a great buy also when you’re planning to go exclusively Linux.

More resources

  • My gnome-terminal uses the Solarized colour scheme from here, and my vim (both console and gnome) are using the setup from the main Solarized repository.

Updates

  • August 27, 2012: Updated fix for brightness controls.
  • June 14, 2012: Added warning about the default home encryption being completely broken.
  • May 6, 2012: Added USB tethering.
  • May 4, 2012: Added the Unity Launcher icon grouping bug fix.
  • May 3, 2012: Added the multi-monitor section after testing with my HDMI Sony TV. Added solution for dark webcam capture in Skype. Also, thanks to Ladislav Bodnar, host of DistroWatch.com, this review is now linked from the Ubuntu page.

Ubuntu 10.10 x86_64 on your Dell E6410 with NVS 3100m GPU

Well howdy hoo! This is the fastest and most painless guide to installing Ubuntu 10.10 (Maverick Meerkat) x86_64 on your Dell E6410 laptop with NVS 3100m GPU.

More specifically, installing Ubuntu 10.10 on this specific hardware configuration poses two problems:

  1. Blank (black, no backlight) display when booting with the install media, or, if you manage to get Linux on the machine, with the installation itself.
  2. Blank (black, no backlight) display when resuming from suspend to ram after having installed Ubuntu.

Solving problem 1

  • Boot with the normal Ubuntu 10.10 x86_64 Desktop live disc. I usually do this from USB memory.
  • When you get to the first boot menu (“Try Ubuntu without installing”, “Install Ubuntu”, etc.), press F6 for other options, then ESC to kill the menu that appears. Move the menubar to “Try Ubuntu without installing”.
  • You can now edit the boot command-line. Replace “quiet splash” with “nouveau.modeset=0”
  • Press enter to boot into the live desktop, then install the whole business as per usual.
  • At the first boot after installation, press ‘e’ at the grub boot screen to edit the command line and again replace “splash quiet” with “nouveau.modeset=0”.
  • You should get all the way to the Ubuntu desktop.
  • Activate the NVidia drivers via System | Administration | Additional Drivers
  • Now edit /etc/default/grub, and replace “splash quiet” in the GRUB_CMDLINE_LINUX_DEFAULT with, you guessed it, “nouveau.modeset=0”.
  • Run “sudo update-grub” at the command-line.
  • Problem solved.

Solving problem 2

  • Edit the GRUB_CMDLINE_LINUX_DEFAULT variable /etc/default/grub again. When you’re done, it should read (we’ve added the acpi_sleep bit at the end):
    GRUB_CMDLINE_LINUX_DEFAULT="nouveau.modeset=0 acpi_sleep=nonvs"
  • Run “sudo update-grub” at the command-line.
  • Problem solved.
  • (if you really want to know more about this, including several other more painful work-arounds, read this bug report)

Generally, I’m really impressed with the general slickness of 10.10 on this machine. What impressed me particularly, was that powertop reported about 14W of power consumption at idle on this out-of-the-box setup (disregarding the two tweaks above). It used to take much more effort to get that low.