Coral TPU and TensorFlow Lite application note

This application note describes how to get the Coral TPU M.2 and mPCIe AI accelerator cards working on the Ten64 under Debian, both natively and inside a VM (using PCIe passthrough/VFIO).

The same instructions (excluding gasket driver install) should also apply to the Coral USB accelerator, however, the PCIe based accelerators can operate faster and without thermal restrictions.

Usage under VMs with VFIO/passthrough

The Coral PCIe accelerators will work under VFIO passthrough, but VMs hosts with earlier kernel versions (<5.4) may not work, as the host needs to perform PCIe quirk fixups.

If the Coral card fails to passthrough, you will need the PCI: Move Apex Edge TPU class quirk to fix BAR assignment patch.

Driver and Software Installation

The instructions for software installation are nearly the same as the Coral instructions, however, you may encounter issues getting the PCIe driver (gasket) installed from the Coral repository due to linux-header dependencies that cannot be met on arm64.

If you are running a recent kernel (5.4 or later) you may already have the gasket and apex drivers - these are currently in drivers/staging/gasket in the Linux kernel. We don't recommend using the staging version of the driver in kernels prior to 5.7, in part due to the PCIe quirk handling issue mentioned above.

Install the kernel headers for your kernel and DKMS:
```
sudo apt-get install dkms linux-headers-4.19.0-10-arm64 build-essential
```
(Note: You need to choose the correct linux-headers package for your running kernel)
Download the gasket-dkms package:
```
apt-get download gasket-dkms
```

Extract the gasket source, add to DKMS and install

 sudo dpkg --force-depends -i gasket-dkms_1.0-13_all.deb
 ar x gasket-dkms_1.0-13_all.deb
 cd gasket
 sudo cp -r usr/src/gasket-1.0 /usr/src
 sudo dkms add gasket/1.0
 sudo dkms build gasket/1.0
 sudo dkms install gasket/1.0

Check that the gasket and apex drivers load and that the /dev/apex_0 device exists.

 sudo modprobe gasket
 sudo dmesg | grep gasket
 [    4.676912] gasket: loading out-of-tree module taints kernel.
 [    4.737324] gasket: module verification failed: signature and/or required key missing - tainting kernel
 sudo dmesg | grep apex
 [    5.229682] apex 0000:00:05.0: enabling device (0000 -> 0002)

(Optional) give your user account permissions to access the apex device (reboot required to take effect):

 sudo sh -c "echo 'SUBSYSTEM==\"apex\", MODE=\"0660\", GROUP=\"apex\"' >> /etc/udev/rules.d/65-apex.rules"
 sudo groupadd apex
 sudo adduser $USER apex

Install the edgetpu libraries:
```
 sudo apt-get install libedgetpu1-std
```
Install TensorFlow Lite

See the official TensorFlow Lite install page for URL's.

You will need python3 and python3-pip, as well as numpy and pil(llow), if you don't have it installed already:
```
 sudo apt-get install python3 python3-pip python3-numpy python3-pil
```

Run the Coral example/demo

This follows from Coral's getting started guide.

 mkdir coral && cd coral
 git clone https://github.com/google-coral/tflite.git
 cd tflite/python/examples/classification
 bash install_requirements.sh
 python3 classify_image.py \
     --model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite \
     --labels models/inat_bird_labels.txt \
     --input images/parrot.jpg

 ----INFERENCE TIME----
 Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
 12.6ms
 2.5ms
 2.4ms
 2.4ms
 2.4ms
 -------RESULTS--------
 Ara macao (Scarlet Macaw): 0.77734

You can also run the classification model without the TPU to compare (by specifying a model file not compatible with the TPU):

 $ python3 classify_image.py \
     --model models/mobilenet_v2_1.0_224_inat_bird_quant.tflite \
     --labels models/inat_bird_labels.txt \
     --input images/parrot.jpg

 ----INFERENCE TIME----
 Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
 140.4ms
 138.9ms
 139.1ms
 139.3ms
 139.3ms
 -------RESULTS--------
 Ara macao (Scarlet Macaw): 0.77734

So the TPU has given us a 58x speedup (138ms CPU vs 2.4ms on TPU) - not bad!