PyTorch 1.0 preview (Nov 10, 2018) packages with full CUDA 10 support for your Ubuntu 18.04 x86_64 systems.

(The wheel has now been updated to the latest PyTorch 1.0 preview as of November 10, 2018.)

You’ve just received a shiny new NVIDIA Turing (RTX 2070, 2080 or 2080 Ti), or maybe even a beautiful Tesla V100, and now you would like to try out mixed precision (well mostly fp16) training on those lovely tensor cores, using PyTorch on an Ubuntu 18.04 LTS x86_64 system.

tensor-core.jpg

The idea is that these tensor cores chew through fp16 much faster than they do through fp32. In practice, neural networks tolerate having large parts of themselves living in fp16, although one does have to be careful with this. Furthermore, fp16 promises to save a substantial amount of graphics memory, enabling one to train bigger models.

For full fp16 support on the Turing architecture, CUDA 10 is currently the best option. Also, a number of CUDA 10 specific improvements were made to PyTorch after the 0.4.1 release.

However, PyTorch 1.0 (first release after 0.4.1) is not quite ready yet, and neither is it easy to find CUDA 10 builds of the current PyTorch 1.0 preview / PyTorch nightly.

Oh noes…

Well, fret no more!

Here you’ll be able to find a fully CUDA 10 based build (pip wheel format) of PyTorch master as on November 10 (updated!), 2018, up to and including commit d02781a. I’ve linked it with a fully CUDA 10 based MAGMA as well, which I built as a conda package.

Installing and using these packages.

Ensure that you have an Ubuntu 18.04 LTS system with CUDA 10 and CUDNN installed and configured. See this great CUDA 10 howto by Puget Systems.

After this, you will also need to download CUDNN 7.1 packages for your system from the NVIDIA Developer site. An NVIDIA developer account (free signup) is required for this. I downloaded and installed libcudnn7_7.3.1.20-1+cuda10.0_amd64.deb and libcudnn7-dev_7.3.1.20-1+cuda10.0_amd64.deb but you’ll probably only need the former.

Setup a suitable conda environment with Python 3.7. Setup and activate with something like the following:

conda create -n pt python=3.7 numpy mkl mkl-include setuptools cmake cffi typing
conda activate pt
conda install -c mingfeima mkldnn

You can now download the PyTorch nightly wheel (361MB) and install with:

pip install torch-1.0.0a0+d02781a-cp37-cp37m-linux_x86_64.whl

The libraries in the wheel don’t have the conda-style relative RUNPATH correctly set, so you have to set LD_LIBRARY_PATH every time when starting your jupyter or any other Python code. This should work:

LD_LIBRARY_PATH=$CONDA_PREFIX/lib jupyter lab

You’re now good to go!

First tests of mixed precision training with fast.ai on Tesla V100.

I fired up a Google Compute Engine with Tesla V100 node in Amsterdam to check that everything works.

I used the latest version of the fastai library, and specifically the callbacks.fp16 notebook which forms part of the brilliant new fastai documentation generation system. See for example the generated page on the fp16 callbacks.

Below I show the MNIST example code where I tried to compare fp32 with fast.ai fp16 (well, mixed precision to be precise) training.

The simple CNN trains up to 97% accuracy in 8 seconds, which is pretty quick already, but I could not see any training speed difference between fp16 and fp32. This could very well be because the network is so tiny.

However, I could confirm that the model parameters (at the very least) were all stored in fp16 floats when using the fast.ai to_fp16() Learner method.

Train CNN with fp16

from fastai import *
from fastai.vision import *
path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)
model = simple_cnn((3,16,16,2))
learn = Learner(data, model, metrics=[accuracy]).to_fp16()
learn.fit_one_cycle(5)
Total time: 00:08
epoch  train_loss  valid_loss  accuracy
1      0.202592    0.139505    0.948970  (00:01)
2      0.112530    0.103523    0.967125  (00:01)
3      0.079813    0.063746    0.973994  (00:01)
4      0.066733    0.056465    0.976938  (00:01)
5      0.069775    0.055017    0.977429  (00:01)

Check that type of parameters is half:

for p in model.parameters():
    print(p.type())
torch.cuda.HalfTensor
torch.cuda.HalfTensor
torch.cuda.HalfTensor

Train CNN with fp32

model32 = simple_cnn((3,16,16,2))
learn32 = Learner(data, model32, metrics=[accuracy])
learn32.fit_one_cycle(5)
Total time: 00:08
epoch  train_loss  valid_loss  accuracy
1      0.213889    0.151780    0.942100  (00:01)
2      0.106975    0.092190    0.966634  (00:01)
3      0.084529    0.083353    0.973013  (00:01)
4      0.069017    0.066023    0.976938  (00:01)
5      0.060235    0.056738    0.980373  (00:01)

Check that type of model parameters is full float:

for p in model32.parameters():
    print(p.type())
torch.cuda.FloatTensor
torch.cuda.FloatTensor
torch.cuda.FloatTensor

Use the hardware-based full disk encryption of your TCG Opal SSD with msed

(This post has been updated since initial publication, see last section for details.)

Introduction

My blog post on usable hardware-based SSD encryption has seen a great deal of activity. Although that post dealt primarily with the ATA security based type of hardware-based full drive encryption, readers from all over joined the discussion in the comments to talk about an increasing number of new self-encrypting drives supporting the TCG Opal standard.

msed_pba_bootup.jpg

Up until recently, configuring these TCG Opal drives was only possible under Windows, or under Linux with a commercial solution that was not available to mere end-users. Fortunately, a programmer named r0m30 stepped up to the challenge and has developed an open source utility called msed and an accompanying pre-boot authorization (PBA) image with which the super fast encryption function on these drives can be fully configured and used also in pure Linux systems.

This post summarises how I built, configured and installed msed and its PBA on my Ubuntu 14.04.1 machine with its Samsung 850 PRO 512G TCG Opal-compliant SSD.

How does TCG Opal drive encryption work?

Many modern SSDs perform transparent AES encryption on all written data in hardware. One advantage of this approach is that the whole drive can be secure erased by simply generating a new set of encryption keys. Another advantage is that users can have all of their data fully encrypted at rest without any performance hit whatsoever. Also, third-party software-based drive encryption negatively affects SSD performance and longevity, for the largest part because this data is basically incompressible when it hits the drive.

TCG Opal is a new standard for communicating with supporting drives concerning their encryption functionality. Furthermore, it includes a really elegant way to have the user supply their authorization credentials.

In its default state, the main disc area is completely locked and inaccessible. However, when the system is booted, the encrypted disc exposes a fake disc from its firmware, called the shadow MBR (master boot record), 128MB in size. Usually this shadow MBR is flashed with the pre-boot authorization (PBA) image, which is in essence a small operating system (including MBR, boot sector, filesystem) that asks the user for their drive password, which it then communicates to the disc via OPAL commands. If the password is valid, the disc unlocks itself, and then the real operating system is loaded up.

This white paper by HP contains an explanation of the provisioning and boot process on page 5. To summarise: Once correctly configured, a system with such an OPAL-compliant disc will request the drive password at boot. The drive will only unlock and decrypt if the correct password is supplied.

Building msed and its PBA image from source

r0m30 programmed a suitable PBA image based on the syslinux open source, and a utility called msed for the provisioning (setting password, writing PBA image) of OPAL drives.

Because this software performs a security critical function, I reviewed as much as possible of the source code in syslinux/com32/msedpba (the Opal-specific part of the PBA) and of the whole msed utility, including the script that builds the PBA image. (I also spent some hours disassembling the binary PBA image.)

After this mini review, it was of course preferable to build and use my own binaries.

To build both the PBA image and msed from source, I did the following:

# I retrieved these sources on Tuesday 2015-02-10
git clone https://github.com/r0m30/msed.git
git clone https://github.com/r0m30/syslinux
cd syslinux
# make clean is going to fail trying to get the EFI submodule. ignore.
make clean
make bios
cd ../msed/image
sudo ./buildbiospba
# remember the location of the resultant .img file!
gunzip biospba-0.20beta.img.gz
# now let's build msed itself
cd ..
# I'm on x86_64, adapt to your own architecture!
make CONF=Release_x86_64
# copy the image to the same location as the msed binary
cp image/biospba-020beta.img .

Stripping the msed binary at top-level, I found an md5sum-identical binary to the 0.20.0 one that I downloaded from r0m30’s site:

cpbotha@meepz97:~/build/msed/msed/dist/Release_x86_64$ md5sum msed 
3a22c344ecbfa15b43ae7764341060ab  msed

Installing the msed PBA

This is very important: I’ve configured my BIOS to boot in legacy mode, i.e. NOT UEFI. The msed documentation also states that this is necessary. It also makes sense, because the PBA image is a legacy boot image!

msed needs libata.allow_tpm to be configured for the running kernel. I edited /etc/default/grub so it looked like this:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash libata.allow_tpm=1"

… after which I did update-grub and then rebooted. After reboot, msed --scan gave me sensible output.

It was now time to configure the drive for encryption. I found this quite stressful; I’ve had near-bricking experiences with expensive Intel 520 SSDs during some of my previous ATA security experiments with flakey BIOS implementations (Insyde H20, what a mess). In any case, I followed this procedure:

# set the drive password: mine is long, but no spaces, no special chars
./msed --initialsetup mylongpassword /dev/sda
# write the PBA into the shadow MBR
./msed --loadPBAimage mylongpassword biospba-0.20.img /dev/sda
# activate the shadow MBR
./msed --setMBREnable on mylongpassword /dev/sda
# activate drive locking
./msed --enableLockingRange 0 mylongpassword /dev/sda

After this, I switched the machine off, and on again. Lo and behold! I was prompted for my OPAL password at bootup, and could let myself in.

To test, I booted up the machine with a Linux Live USB. In place of the encrypted disk I could only see the shadow MBR.

Conclusion

TCG Opal is a great way of using your SSD’s hardware-based full disc encryption. I am very grateful to r0m30 for creating msed and its PBA image: These are crucially important open source tools for working with Opal discs.

Updates to this post

June 9, 2016

Fixed missing “of” in title. Thanks adutoit!

After adding a 500G Samsung 850 EVO to the 850 PRO already in my desktop machine, the BIOSPBA was not able to unlock both drives. However, after a quick upgrade to the LinuxPBA using the same procedure as documented in this post, both drives unlock after correct password entry. As an added bonus, with the BIOS on this machine set to Legacy+UEFI, I legacy boot into the PBA, enter my password, have both drives unlock, and then automatically UEFI boot to any of the operating systems on the GPT partitions of the unlocked drives.

Huawei E3331 3G USB dongle works on Ubuntu 14.04 Linux

In the store today, I wanted to check that the Huawei E3331 3G USB dongle I was about to buy would work with my Ubuntu Linux laptops. Because I couldn’t find any posts confirming this, I’m writing this one.

Summary: I can confirm that the Huawei E3331 3G USB dongle works, completely out of the box and without any problems, on Ubuntu 14.04.

After inserting the card into a USB slot, I was greeted by this notification:

As per the instructions, I could immediately open the HiLink Web UI at http://192.168.1.1 with my browser, where, after configuring my APN like this:

The home screen showed that I was successfully connected to the 3G network:

No drivers were required. Linux (in my case Ubuntu 14.04 on x86_64) is able to connect to the device using its built-in LAN-over-USB support. This is what the relevant part of the system log looks like when the device is inserted:

[378719.431633] usb 3-3: new high-speed USB device number 73 using xhci_hcd
[378719.450078] usb 3-3: New USB device found, idVendor=12d1, idProduct=14db
[378719.450085] usb 3-3: New USB device strings: Mfr=2, Product=1, SerialNumber=0
[378719.450089] usb 3-3: Product: HUAWEI Mobile
[378719.450092] usb 3-3: Manufacturer: HUAWEI
[378719.461252] cdc_ether 3-3:1.0 eth1: register 'cdc_ether' at usb-0000:00:14.0-3, CDC Ethernet Device, 58:2c:80:13:92:63

Here you have the prerequisite speedtest:

3563342453.png

(This dongle is a connectivity backup. It’s an added bonus that the upstream is 3x that of my ADSL at home.)

Samson C01U USB condenser microphone on Ubuntu Linux 12.04

I recently acquired the Samson C01U USB condenser microphone for better quality voice-overs on the sleep-inducing screencasts I sometimes make. It took some fiddling to get it setup correctly on Ubuntu 12.04 with the default ALSA drivers and PulseAudio sound system, so I’ve documented the steps here on the chance that it might help some other Ubuntu / Linux user.

The microphone looks like this:

Samson C01U condenser USB microphone
Samson C01U condenser USB microphone

It comes with a USB cable, pouch and usable tripod stand. One can accessorize with the Samson PS01 pop-filter (have it), and even with a spider shockmount (don’t have it yet, I like people to hear me typing when I make screencasts). Importantly, the quality of the recorded audio is miles better than any headset, if you can get the levels setup correctly.

It turns out that the microphone has a stereo amplifier chip. Both channels are exposed to the computer it’s connected to, as left and right. However, the two amplifiers have been cascaded for more gain. The right channel is the intermediate audio, i.e. after the first amplifier, and should not be used. The left channel is the final output that should be used. Furthermore, both the gains can be separately adjusted, and this is the reason my recordings were initially far too soft.

To adjust the gains of the built-in amplifiers, you have to use the alsamixer application, which you can start up from a terminal window. Right after startup, it will probably look something like this:

alsamixer right after startup. Where's my microphone?
alsamixer right after startup. Where’s my microphone?

It will probably show the channels available on your default sound card. Press F6, then select your Samson, then press F4 to select the capture channels. You should now see this:

I'm a little Samson microphone, and I'm all alone!
I’m a little Samson microphone, and I’m all alone!

Here you can adjust the gains of the right (pre-amplifier) and left (main amplifier) separately. This is completely separate from the gain that you can set in the Ubuntu / Gnome sound settings:

Ubuntu / Gnome sound settings: Microphone set to unampfilied.
Ubuntu / Gnome sound settings: Microphone set to unampfilied.

I’ve found that by setting the gain of both of the built-in microphone amplifiers to about 19 dB with alsamixer, I can keep the (probably software) gain in Ubuntu / Gnome sound settings at “Unamplified”. Note also that I’ve selected the “Analog Mono Input” mode. I’ve tried with different gain settings for left and right, as some permutations should in theory have less noise than others for the same total gain, but have not yet found anything that resulted in a difference I could hear.

So that’s it kids. Let me know in the comments if you have any questions, if this howto might have helped you or you have other ideas about the perfect left/right gain settings!

Update on 2013-05-17

Recently, Google Hangout users started reporting that the volume of my voice was too low. This was strange, because the recordings I made with Sound Recorder were perfect. After some frustrating minutes, I discovered that the Pulse per-application volume for Google Chrome (which I use for Hangouts) had been adjusted. This means that there’s a third configuration that you should check when adjusting the levels of your C01U (or any other microphone), and that’s on the “recording” tab of the Pulse Audio Volume Control (pavucontrol). See this screencast (and check my description) for more details:

Acer V3-571G FullHD IPS: Superb price/performance Linux development laptop

I recently needed a new mobile development workstation. My main requirements were that it should have at least a Full HD (1920×1080) IPS (in-plane switching) screen and a good keyboard, and that it should be able to run Linux, preferably Ubuntu, as its primary operating system.

After experimenting with a screenshot of my 1920×1080 desktop workstation running IntelliJ Idea 12 (my IDE of choice) on an Asus UX31A with 13″ Full HD IPS screen,  I realised that I would have to go with a larger screen. The Asus UX52VS with 15.6″ IPS also looked like a good bet, but there were no reviews available yet, it was not clear whether the 4GB RAM and hybrid HDD (large spindle drive, 24GB SSD cache) would be easily upgradable to full SSD, and the  €1200 price tag was reason for more consideration.

I finally stumbled upon this review of the Acer V3 571G with Full HD IPS, which was mostly quite surprised that such a laptop with such a screen could be sold for entry-level prices. I subsequently purchased model number V3-571G-73638G75Maii, with Full HD IPS (this is the LP156WF4-SPB1 LED IPS matte panel by LG Philips ), Intel i7 36732QM (a real mobile quad-core; many mobile i7s are dual core), NVIDIA GeForce 710m with 2GB VRAM (Optimus graphics switching), 8GB RAM, 750G HDD, all for €799. I also purchased an Intel 520 240G SSD, a really fast SSD with built-in hardware encryption that would replace the main HDD, for €200.

Photo courtesy of notebookcheck. Do see their great review (linked in the post).
Photo courtesy of notebookcheck. Do see their great review (linked in the post).

Upgrading HDD and RAM

My first impression of the laptop was that in reality it does not look quite as cheap as the photos might make one believe. I was pleasantly surprised when I set out to replace the HDD with the Intel SSD. After removing two screws on the underside, a panel can be removed behind which the hard drive and RAM can be easily upgraded:

Upgrading the hard drive and ram has been made straight-forward, as it should be.
Upgrading the hard drive and ram has been made straight-forward, as it should be.

Configuring Linux: Ubuntu 12.04.2

After the SSD upgrade, installing Ubuntu 12.04.2 went mostly without a hitch. 12.04.2 comes with the LTSEnablementStack, backports of the Quantal kernel (3.5) and the new X stack to support more hardware. This caused some dependency problems when I installed bumblebee (Linux support for NVIDIA Optimus graphics switching), but this problem was almost immediately fixed by the ubuntu-x-swat team when I reported it on #freenode, so you should be fine. Just in case you need a reminder, bumblebee is installed and configured as follows:

sudo add-apt-repository ppa:ubuntu-x-swat/x-updates
sudo add-apt-repository ppa:bumblebee/stable
sudo apt-get update
sudo apt-get install bumblebee bumblebee-nvidia primus

If you want to run something on the NVIDIA, just do “primusrun command” or “optirun command”, where the former is preferred due to performance.

Other than that, make sure you have GRUB_CMDLINE_LINUX_DEFAULT=”apci_backlight=vendor acpi_osi=” in your /etc/default/grub (run update-grub and reboot after you change this) to get the screen brightness hotkeys working. Unfortunately, the brightness notifier itself does not work, but this is not a problem.

Weakpoint: BIOS ATA security support

After a few mails to-and-fro with Acer tech support (they do respond, mostly) and two nights of experiments, I can now confirm that the HDD password implementation on the laptop is worth less than nothing. In the spirit of full disclosure, this is the Insyde H2O BIOS implementation of HDD  passwords. This BIOS is used on many modern laptops besides Acer.

For many of the current self-encrypting drives, BIOS support of the ATA security feature mode set is important. It should be possible to set both master and user passwords, and, more importantly, the BIOS should ask for this password at bootup, at which point it should pass the user-entered password, unchanged, to the hard drive as ATA commands. Setting the HDD password on the Acer does none of the above. Instead, it sets a fixed password that has nothing to do with the user password. At bootup, it asks for a HDD password. However, if you enter this incorrectly 3 times, you get a hash code. This hash code can be used with a simple Python script to generate a master unlock password with which the HDD can be trivially unlocked. I confirmed experimentally that this works.

I also experimented with setting the ATA security user password to a known value using hdparm from a Linux boot USB. The Insyde H2O BIOS unfortunately does not fall back to sane behaviour.

To summarise: The Acer BIOS can’t be used to manage ATA security. Because it is important that my SSD is fully encrypted, I now boot the laptop with a USB stick, unlock with the real ATA user password using hdparm, and then warm-boot back into the SSD. I perceive this as a relatively small price to pay for reasonable and super fast data security (my Intel does 500MB+ read and write, all with AES-128 encryption). Remember that software encryption has a severe performance and durability impact on all SSDs, especially those using compressing controllers such as the Sandforce, but also SSDs that employ no compression at all. AES-NI is really not the issue here, it has to do with the performance and durability optimizations modern SSD controllers do.

Verdict

The matte Full HD IPS screen on this laptop is a pleasure to use. I find the chiclet keyboard above average for programming. It’s not as rigid as the keyboard on my Samsung NP300V3a, but it’s entirely acceptable. The combination of an Ivy Bridge i7 3632QM quad core, an Intel 520 SSD and 8GB of 1600MHz DDR3 RAM makes for a laptop that feels super responsive. Taken together with the solid Ubuntu support and the €799 + €200 price tag, and in spite of lack of ATA security support in the BIOS, I can only highly recommend this machine to any developer looking for a powerful Linux laptop on a budget.