- Use cases
-
1. Preprocessing
- SageMaker Object Detection preprocessing
- Rekognition Object Detection preprocessing
- SageMaker Kmeans preprocessing
- Autopilot preprocessing
- DeepAR preprocessing
- Personalize preprocessing
- Select, drop or extract Columns
- Split dataset to Train and Test
- Upload to s3
- Forecast preprocessing
- Rekognition Classification preprocessing
- SageMaker Image Classification preprocessing
- Xgboost preprocessing
- Blazingtext preprocessing
- Comprehend custom preprocessing
-
2. Training
- SageMaker Object Detection training
- Rekognition Object Detection training
- Forecast training
- Personalize training
- BlazingText training
- DeepAR training
- SageMaker Kmeans training
- Comprehend custom training
- Autopilot Training
- Xgboost Training
- Autogluon training
- Rekognition Classification training
- SageMaker Image Classification training
-
3. Inference
- SageMaker Object Detection inference
- Forecast inference
- Rekognition Object Detection inference
- Comprehend custom inference
- Personalize inference
- Autopilot Inference
- BlazingText Inference
- Custom SageMaker model Inference
- DeepAR Inference
- Rekognition Classification inference
- SageMaker Image Classification inference
- SageMaker Kmeans inference
- Xgboost Inference
- Contribute a use case or contact us for help.
- Frequently Asked Questions
SageMaker Image Classification preprocessing
The Amazon SageMaker image classification algorithm is a supervised learning algorithm that supports multi-label classification. It takes an image as input and outputs one or more labels assigned to that image. It uses a convolutional neural network (ResNet) that can be trained from scratch or trained using transfer learning when a large number of training images are not available.
The recommended input format for the Amazon SageMaker image classification algorithms is Apache MXNet RecordIO. However, you can also use raw images in .jpg or .png format. Refer to this discussion for a broad overview of efficient data preparation and loading for machine learning systems.
We think that training with the RecordIO format is the easiest way to get started with your image classification PoC on SageMaker. For full details on the input-output interface, see this.
Prepare your dataset in ImageRecord format
Raw images are natural data format for computer vision tasks. However, when loading data from image files for training, disk IO might be a bottleneck. For instance, when training a ResNet50 model with ImageNet on an AWS p3.16xlarge instance, The parallel training on 8 GPUs makes it so fast, with which even reading images from ramdisk can’t catch up. To boost the performance on top-configured platform, we suggest users to train with MXNet’s ImageRecord format.
It is as simple as a few lines of code to create ImageRecord file for your own images.
Assuming we have a folder ./example, in which images are places in different subfolders representing classes:
./example/class_A/1.jpg
./example/class_A/2.jpg
./example/class_A/3.jpg
./example/class_B/4.jpg
./example/class_B/5.jpg
./example/class_B/6.jpg
./example/class_C/100.jpg
./example/class_C/1024.jpg
./example/class_D/65535.jpg
./example/class_D/0.jpg
...
Download prerequisite packages
wget https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/im2rec.py ./
pip install mxnet
Generate a .lst file, i.e. a list of these images containing label and filename information.
python im2rec.py ./example_rec ./example/ --recursive --list --num-thread 8 --test-ratio=0.3 --train-ratio=0.7
Then create the .rec files for training and validation
python im2rec.py ./example_rec ./example/ --recursive --pass-through --pack-label --num-thread 8
It gives you two more files: example_rec.idx and example_rec.rec. Now, you can use them to train!
Then use this link to upload your .rec files to s3!
For example, do:
import sagemaker
sess = sagemaker.Session()
trainpath = sess.upload_data(
path='example_rec_train.rec', bucket='mybucketname',
key_prefix='sagemaker/input')
testpath = sess.upload_data(
path='example_rec_test.rec', bucket='mybucketname',
key_prefix='sagemaker/input')
``
Related content:
- ☞ Autogluon training – 2 min read
- ☞ SageMaker Image Classification inference – 2 min read
- ☞ SageMaker Image Classification training – 3 min read