Object detection using Pascal VOC dataset with SageMaker

The Amazon SageMaker Object Detection algorithm detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene. The object is categorized into one of the classes in a specified collection with a confidence score that it belongs to the class. Its location and scale in the image are indicated by a rectangular bounding box. It uses the Single Shot multibox Detector (SSD) framework and supports two base networks: VGG and ResNet. The network can be trained from scratch, or trained with models that have been pre-trained on the ImageNet dataset.

The recommended input format for the Amazon SageMaker object detection algorithms is Apache MXNet RecordIO. However, you can also use raw images in .jpg or .png format.

We think that training with the RecordIO format is the easiest way to get started with your image classification PoC on SageMaker. For full details on the input-output interface, see this

Prepare your dataset in ImageRecord format


Raw images are natural data format for computer vision tasks. However, when loading data from image files for training, disk IO might be a bottleneck. For instance, when training a ResNet50 model with ImageNet on an AWS p3.16xlarge instance, The parallel training on 8 GPUs makes it so fast, with which even reading images from ramdisk can’t catch up. To boost the performance on top-configured platform, we suggest users to train with MXNet’s ImageRecord format.

We will use Pascal VOC, a popular computer vision challenge, dataset. We will use the data sets from 2007 and 2012, named as VOC07 and VOC12 respectively the latest one comprises of more than 20,000 images containing about 50,000 annotated objects. These annotated objects are grouped into 20 categories.

Download prerequisite packages

wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/im2rec.py ./
wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/prepare_dataset.py ./
wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/concat_db.py ./
wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/imdb.py ./
wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/pascal_voc.names ./
wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/tools/pascal_voc.py ./
pip install mxnet
pip install opencv-python

Download Pascal VOC data sets

wget -P /tmp http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget -P /tmp http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget -P /tmp http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar

Extract the data

tar -xf /tmp/VOCtrainval_11-May-2012.tar && rm /tmp/VOCtrainval_11-May-2012.tar
tar -xf /tmp/VOCtrainval_06-Nov-2007.tar && rm /tmp/VOCtrainval_06-Nov-2007.tar
tar -xf /tmp/VOCtest_06-Nov-2007.tar && rm /tmp/VOCtest_06-Nov-2007.tar

Now, we will combine the training and validation sets from both 2007 and 2012 as the training data set, and use the test set from 2007 as our validation set.

python prepare_dataset.py --dataset pascal --year 2007,2012 --root VOCdevkit --set trainval --target VOCdevkit/train.lst
rm -rf VOCdevkit/VOC2012
python prepare_dataset.py --dataset pascal --year 2007 --root VOCdevkit --set test --target VOCdevkit/val.lst --no-shuffle
rm -rf VOCdevkit/VOC2007

It gives you two sets of files, one with “train” and other with “val”: Such as train.idx, train.lst and train.rec. Now, you can use them to train!

Then use this link to upload your .rec files to s3!

For example, do:

import sagemaker
sess = sagemaker.Session()

trainpath = sess.upload_data(
    path='train.rec', bucket='mybucketname',

testpath = sess.upload_data(
    path='val.rec', bucket='mybucketname',

Training the model

Once we have a usable dataset, we are ready to train the model.

import sagemaker
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri

role = get_execution_role()
sess = sagemaker.Session()
training_image = get_image_uri(sess.boto_region_name, 'object-detection', repo_version="latest")

#the estimator will launch the training job
od_model = sagemaker.estimator.Estimator(training_image,
                                         train_volume_size = 50,
                                         train_max_run = 360000,
                                         input_mode= 'File',
#setup the hyperparameters
#setup data channels
train_data = sagemaker.session.s3_input(trainpath, distribution='FullyReplicated',
                        content_type='application/x-recordio', s3_data_type='S3Prefix')
validation_data = sagemaker.session.s3_input(testpath, distribution='FullyReplicated',
                             content_type='application/x-recordio', s3_data_type='S3Prefix')
data_channels = {'train': train_data, 'validation': validation_data}

#train the model
od_model.fit(inputs=data_channels, logs=True)

Create Endpoint

Once the training is done, you can deploy the trained model as an endpoint.

object_detector = od_model.deploy(initial_instance_count = 1,
                                 instance_type = 'ml.m4.xlarge')

Perform inference

Now, as the model is deployed, we can use it to derive inference.

Let's download a sample image.

wget -O test.jpg https://images.pexels.com/photos/980382/pexels-photo-980382.jpeg
import json

file_name = 'test.jpg'

with open(file_name, 'rb') as image:
    f = image.read()
    b = bytearray(f)

#deriving inference
#inference will be a JSON object
object_detector.content_type = 'image/jpeg'
results = object_detector.predict(b)
detections = json.loads(results)
print (detections)

Related content: