Traverse Technolgy

Coral TPU and TensorFlow Lite application note

This application note describes how to get the Coral TPU M.2 and mPCIe AI accelerator cards working on the Ten64 under Debian, both natively and inside a VM (using PCIe passthrough/VFIO).

The same instructions (excluding gasket driver install) should also apply to the Coral USB accelerator, however, the PCIe based accelerators can operate faster and without thermal restrictions.

Usage under VMs with VFIO/passthrough

The Coral PCIe accelerators will work under VFIO passthrough, but VMs hosts with earlier kernel versions (<5.4) may not work, as the host needs to perform PCIe quirk fixups.

If the Coral card fails to passthrough, you will need the PCI: Move Apex Edge TPU class quirk to fix BAR assignment patch.

Driver and Software Installation

The instructions for software installation are nearly the same as the Coral instructions, however, you may encounter issues getting the PCIe driver (gasket) installed from the Coral repository due to linux-header dependencies that cannot be met on arm64.

If you are running a recent kernel (5.4 or later) you may already have the gasket and apex drivers - these are currently in drivers/staging/gasket in the Linux kernel. We don't recommend using the staging version of the driver in kernels prior to 5.7, in part due to the PCIe quirk handling issue mentioned above.

  1. Install the kernel headers for your kernel and DKMS:

    sudo apt-get install dkms linux-headers-4.19.0-10-arm64 build-essential
    

    (Note: You need to choose the correct linux-headers package for your running kernel)

  2. Download the gasket-dkms package:

    apt-get download gasket-dkms
    
  3. Extract the gasket source, add to DKMS and install

     sudo dpkg --force-depends -i gasket-dkms_1.0-13_all.deb
     ar x gasket-dkms_1.0-13_all.deb
     cd gasket
     sudo cp -r usr/src/gasket-1.0 /usr/src
     sudo dkms add gasket/1.0
     sudo dkms build gasket/1.0
     sudo dkms install gasket/1.0
    
  4. Check that the gasket and apex drivers load and that the /dev/apex_0 device exists.

     sudo modprobe gasket
     sudo dmesg | grep gasket
     [    4.676912] gasket: loading out-of-tree module taints kernel.
     [    4.737324] gasket: module verification failed: signature and/or required key missing - tainting kernel
     sudo dmesg | grep apex
     [    5.229682] apex 0000:00:05.0: enabling device (0000 -> 0002)
    
  5. (Optional) give your user account permissions to access the apex device (reboot required to take effect):

     sudo sh -c "echo 'SUBSYSTEM==\"apex\", MODE=\"0660\", GROUP=\"apex\"' >> /etc/udev/rules.d/65-apex.rules"
     sudo groupadd apex
     sudo adduser $USER apex
    
  6. Install the edgetpu libraries:

     sudo apt-get install libedgetpu1-std
    
  7. Install TensorFlow Lite

    See the official TensorFlow Lite install page for URL's.

    You will need python3 and python3-pip, as well as numpy and pil(llow), if you don't have it installed already:

     sudo apt-get install python3 python3-pip python3-numpy python3-pil
    
  8. Run the Coral example/demo

    This follows from Coral's getting started guide.

     mkdir coral && cd coral
     git clone https://github.com/google-coral/tflite.git
     cd tflite/python/examples/classification
     bash install_requirements.sh
     python3 classify_image.py \
         --model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite \
         --labels models/inat_bird_labels.txt \
         --input images/parrot.jpg
    
     ----INFERENCE TIME----
     Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
     12.6ms
     2.5ms
     2.4ms
     2.4ms
     2.4ms
     -------RESULTS--------
     Ara macao (Scarlet Macaw): 0.77734
     
    

    You can also run the classification model without the TPU to compare (by specifying a model file not compatible with the TPU):

     $ python3 classify_image.py \
         --model models/mobilenet_v2_1.0_224_inat_bird_quant.tflite \
         --labels models/inat_bird_labels.txt \
         --input images/parrot.jpg
    
     ----INFERENCE TIME----
     Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
     140.4ms
     138.9ms
     139.1ms
     139.3ms
     139.3ms
     -------RESULTS--------
     Ara macao (Scarlet Macaw): 0.77734
    

So the TPU has given us a 58x speedup (138ms CPU vs 2.4ms on TPU) - not bad!