Google AutoML Vision - Mods vs. Rockers Revisited!

Introduction

FastAI provides Jupyter notebooks to wrangle data, train models, optimize models and then serve models.

I recommended FastAI to my Data Scientist friends and they found the FastAI Jupyter layout and workflow both cumbersome and confusing.

As an alternative to FastAI (and any roll-your-own vision service, for that matter), GCP provides the Google AutoML Vision service, which automates the tedious aspects of AI Vision efforts.

AutoML Vision simplifies labeling and then automates training, optimization and serving of the model.

GCP provides a no code method to create, deploy and serve AI Vision models at scale!

In this HOWTO we will accomplish the following:

  • Create a Google Cloud Storage Bucket
  • Acquire and label data
  • Train a Vision model that identifies Mods vs. Rockers
  • Fix data labels via GUI
  • Re-train and tune a vision model
  • Serve a Vision model
  • Send our served model a test image

Create a bucket.

We must get the AutoML vision service some labeled data.

We will create a Google Cloud Storage bucket for this purpose.

You will upload two folders to this bucket, one for each class, mods and rockers.

You will also upload a line-delimited CSV file to this bucket that records the URI of each image in the bucket, followed by a label.

For example:

gs://mods-rockers/mods/00000000.jpg,mods
gs://mods-rockers/mods/00000001.jpg,mods
gs://mods-rockers/mods/00000002.jpg,mods

<snip>

gs://mods-rockers/rockers/00000097.jpg,rockers
gs://mods-rockers/rockers/00000098.jpg,rockers
gs://mods-rockers/rockers/00000099.jpg,rockers

IMPORTANT: Ensure that you use a regional bucket in us-central1, location type: Region and required storage class: Standard.

If you do not use the proper bucket configuration, you will receive the following error when you attempt to import your dataset.

Import Fail

The following commands create a bucket, apply permissions to AutoML and give AutoML permissions to access a bucket.

From the cloudshell, export your USERNAME as an environment variable.

Be sure to enter your USERNAME in the following command:

sobanski_htc@cloudshell:~ (mods-rocker-project)$ export USERNAME=<your email address>

Now enable AutoML to access a bucket.

sobanski_htc@cloudshell:~ (mods-rocker-project)$ export PROJECT_ID=$DEVSHELL_PROJECT_ID
sobanski_htc@cloudshell:~ (mods-rocker-project)$ gcloud projects add-iam-policy-binding $PROJECT_ID --member="user:$USERNAME" --role="roles/automl.admin"
Updated IAM policy for project [mods-rocker-project].
bindings:
- members:
  - user:my@email.com
  role: roles/automl.admin
- members:
  - serviceAccount:service-4011961642212@gcp-sa-automl.iam.gserviceaccount.com
  role: roles/automl.serviceAgent
- members:
  - serviceAccount:service-4011961642212@compute-system.iam.gserviceaccount.com
  role: roles/compute.serviceAgent
- members:
  - serviceAccount:4011961642212-compute@developer.gserviceaccount.com
  - serviceAccount:4011961642212@cloudservices.gserviceaccount.com
  role: roles/editor
- members:
  - user:smy@email.com
  role: roles/owner
etag: AxYhhFi=
version: 1

sobanski_htc@cloudshell:~ (mods-rocker-project)$
 gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:custom-vision@appspot.gserviceaccount.com" --role="roles/ml.admin"

Now create the bucket:

sobanski_htc@cloudshell:~ (mods-rocker-project)$ gsutil mb -p $PROJECT_ID -c standard -l us-central1 gs://<your-bucket-name>/

Get a dataset.

If you do not have a labeled dataset, use the FastAI dataset notebook to quickly download a labeled dataset, separated by folder.

If you do have a labeled dataset you can skip this section.

Launch AI Platform

Spin up an AI platform notebook for this task.

Log into the Google Cloud Platform (GCP) console at console.cloud.google.com [Non-referral link].

Type notebooks into the search bar, click Notebooks AI Platform and then click Enable API.

Enable_Notebooks

Click New Instance and then select Python.

Create_Python_Notebook

Launch a terminal.

Launch_Terminal

Install FastAI Course v3

From the terminal install the FastAI course v3.

$ clone https://github.com/fastai/course-v3.git

Install the required FastaAI libs.

Since this is an ephemeral notebook, you will not need to worry about virtual environments.

$ pip install fastai
Collecting fastai
  Downloading fastai-1.0.60-py3-none-any.whl (237 kB)
     |████████████████████████████████| 237 kB 4.6 MB/s 
Requirement already satisfied: numpy>=1.15 in /opt/conda/lib/python3.7/site-packages (from fastai) (1.18.1)

...

Building wheels for collected packages: nvidia-ml-py3
  Building wheel for nvidia-ml-py3 (setup.py) ... done
  Created wheel for nvidia-ml-py3: filename=nvidia_ml_py3-7.352.0-py3-none-any.whl size=19189 sha256=42f79de382946ce4af88196dfdcf55cda496237f7db498bd2cc1cce3f788fba6
  Stored in directory: /home/jupyter/.cache/pip/wheels/df/99/da/c34f202dc8fd1dffd35e0ecf1a7d7f8374ca05fbcbaf974b83
Successfully built nvidia-ml-py3
Installing collected packages: wasabi, srsly, murmurhash, cymem, plac, preshed, catalogue, blis, thinc, spacy, torch, torchvision, nvidia-ml-py3, fastprogress, fastai

Download the images

From the GUI, Navigate to nbs --> dl1 --> lesson2-download.ipynb.

Select Edit --> Clear All Outputs.

Follow the instructions up to the Download images section to create and upload your mods.csv and rockers.csv tables.

These tables include URLs that point to images from each class.

Once you are at the Download Images section, replace the presented code with the code below.

This prevents you from needing to scroll up to the prior section.

classes = ['mods','rockers']
path    = Path('data/brighton_seafront')
for folder in classes:
    dest = path/folder
    dest.mkdir(parents=True, exist_ok=True)
path.ls()
for c in classes:
    file = '{}.csv'.format(c)
    dest = path/c
    download_images(file, dest, max_pics=200)

for c in classes:
    print(c)
    verify_images(path/c, delete=True, max_size=500)

I present the updated code in the following graphic.

FastAI_Notebook

Be sure to run the next verify_images cell.

Label the data

At this point you should have two folders, one named mods and one named rockers.

If you used the AI platform to create your data set folders, then change directories to brigton_seafront.

~$ cd course-v3/nbs/dl1/data/brighton_seafront/

Export the name of your Google Cloud Storage (GCS) bucket.

NOTE: Do not use the bucket name mods-rockers since I own that bucket.

~/course-v3/nbs/dl1/data/brighton_seafront$ export BUCKET_NAME=mods-rockers

Brighton seafront contains two sub-directories, mods and rockers.

Create a spreadsheet that appends the URI for the image, followed by the label.

All of the images in the mods directory will get the label mods and all of the images in the rockers directory will get the label rockers.

~/course-v3/nbs/dl1/data/brighton_seafront$ for name in `ls mods`; do echo gs://$BUCKET_NAME/mods/$name,mods >> labeled_data.csv; done
~/course-v3/nbs/dl1/data/brighton_seafront$ for name in `ls rockers`; do echo gs://$BUCKET_NAME/rockers/$name,rockers >> labeled_data.csv; done

The first few lines of the labeled_data.csv file read:

~/course-v3/nbs/dl1/data/brighton_seafront$ head -n3 labeled_data.csv
gs://mods-rockers/mods/00000000.jpg,mods
gs://mods-rockers/mods/00000001.jpg,mods
gs://mods-rockers/mods/00000002.jpg,mods

And the last few lines read:

~/course-v3/nbs/dl1/data/brighton_seafront$ tail -n3 labeled_data.csv
gs://mods-rockers/rockers/00000097.jpg,rockers
gs://mods-rockers/rockers/00000098.jpg,rockers
gs://mods-rockers/rockers/00000099.jpg,rockers

Copy labeled_data.csv, the mods folder, the rockers folder and all of their contents to your GCS bucket.

~/course-v3/nbs/dl1/data/brighton_seafront$ gsutil -m cp -r labeled_data.csv mods/ rockers/ gs://$BUCKET_NAME

At this point be sure to destroy the AI Platform notebook so that you do not incur any charges.

Train the model

Enable the API

Log into the Google Cloud Platform (GCP) console at console.cloud.google.com [Once again, this is a non-affiliate link].

In the search bar, type Vision and then click ENABLE AUTOML API.

Enable_API

Upload your Dataset

Click Get Started --> New Dataset --> Multi-label classification

AutoML Import Dataset

Under Select files to import, select Select a CSV file on Cloud Storage and then enter the URI for the labeled_data.csv file on your bucket.

Select_Bucket_With_Labeled_CSV

The import will take several minutes.

Import_Image

View Images

After the import completes, you will see your labeled images.

Successful_Import

A brief perusal of the images shows that some pictures (highlighted in red) include incorrect labels.

Bad_Labels

For now, let's ignore the bad labels and see what happens.

Train your Model

Select Start Training

Start training bad labels

The training will use 16 GPU hours.

Since GCP farms the training out in parallel, the 16 GPU hours take less than an hour.

Evaluate your Model

After the training completes, click Evaluate.

You will see that the model provides sub 90% precision and recall, as noted by the confusion matrix (highlighted in green).

Bad_Labels_Results

Drill down for more details and you will see that the false positives for mods includes two pictures of mods.

This points to a labeling problem.

Bad_Labels_Cause_Errors

NOTE: Upon second glance, the picture on the right depicts Teddy Boys. Should I label Teddy Boys Mods, Rockers or delete the picture? Answer in the comments below!

Fix Labels

Click images and change the labels of the troublesome images (or just delete them if you're lazy right now).

I have a rocker motorcycle labeled mod and a picture that includes both mods and rockers labeled as just mods.

Delete_Confusing_Ones

I like this picture, a bunch of rockers attempting to murder two helmet-less mods, who find it funny.

Both_Mod_and_Rocker

Re-train model

After we clean up the data and re-train, we see a perfect confusion matrix.

Much_Better

Drilling down we see our model gave a mod under arrest the rocker label.

One_Wrong

Deploy the model

Unlike FastAI, the Google AI Platform provides one-click deployment of your model.

Click Test & Use and the Deploy Model.

GCP takes several minutes to deploy the model.

Deploy_The_Model

After you deploy the model, click the Upload Images button and upload up to ten images.

I upload a picture of myself at the park.

The model reports, with 93% certainty that I fall under the Mod classification, vs. Rocker.

Serve_Sobanski

My paisley shirt and Italian sunglasses give credence to this, although I do have a Rocker hair cut (styled with Royal Crown, no less).

Conclusion

GCP provides an AutoML vision service that automates the manual FastAI tasks of training, optimizing and serving a Vision model.

AutoML vision also provides an easy to use and intuitive labeling service.

If you can get a hold of labeled data, then I would recommend the AutoML vision service.

Use the AutoML vision service for serious tasks. Google throws the kitchen sink at training and tuning.

Each model consumed sixteen (16) hours of GPU time!

My FastAI model ran for two or three minutes, on one GPU.

My two runs (32 hours total), cost about $100.

Pricey

Google, however, gave me $176.00 to experiment with the model training and serving.

From Google:

Free Trial! You can try AutoML Vision Object Detection for free by using 40 free node hours each for training and online prediction, and 1 free node hour for batch prediction, per billing account. Your free node hours are issued right before you create your first model. For batch prediction, the free node hour is issued at the time of the first batch prediction is initiated. You have up to one year to use them. Prices are listed in US Dollars (USD). If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply."

I did not need to eat into the $300 in free credits google provided when I signed up for GCP!

Free Goody

Show Comments