- Use cases
-
1. Preprocessing
- SageMaker Object Detection preprocessing
- Rekognition Object Detection preprocessing
- SageMaker Kmeans preprocessing
- Autopilot preprocessing
- DeepAR preprocessing
- Personalize preprocessing
- Select, drop or extract Columns
- Split dataset to Train and Test
- Upload to s3
- Forecast preprocessing
- Rekognition Classification preprocessing
- SageMaker Image Classification preprocessing
- Xgboost preprocessing
- Blazingtext preprocessing
- Comprehend custom preprocessing
-
2. Training
- SageMaker Object Detection training
- Rekognition Object Detection training
- Forecast training
- Personalize training
- BlazingText training
- DeepAR training
- SageMaker Kmeans training
- Comprehend custom training
- Autopilot Training
- Xgboost Training
- Autogluon training
- Rekognition Classification training
- SageMaker Image Classification training
-
3. Inference
- SageMaker Object Detection inference
- Forecast inference
- Rekognition Object Detection inference
- Comprehend custom inference
- Personalize inference
- Autopilot Inference
- BlazingText Inference
- Custom SageMaker model Inference
- DeepAR Inference
- Rekognition Classification inference
- SageMaker Image Classification inference
- SageMaker Kmeans inference
- Xgboost Inference
- Contribute a use case or contact us for help.
- Frequently Asked Questions
Comprehend custom training
Make sure you saw this link for preprocessing first
Custom classification is a two step process:
- Identify labels and create and train a custom classifier to recognize those labels.
- Once amazon Comprehend trains the classifier, send unlabeled documents to be classified using that classifier.
Training a Custom Classifier
Using the AWS SDK for Python:
Instantiate Boto3 SDK:
import boto3
client = boto3.client('comprehend', region_name='region')
To create a Classifier:
create_response = client.create_document_classifier( InputDataConfig={ 'S3Uri': 's3://S3Bucket/docclass/file name' }, DataAccessRoleArn='arn:aws:iam::account number:role/resource name', DocumentClassifierName='SampleCodeClassifier1', LanguageCode='en')
To run a custom classifier job:
start_response = client.start_document_classification_job( InputDataConfig={ 'S3Uri': 's3://S3Bucket/docclass/file name', 'InputFormat': 'ONE_DOC_PER_LINE' }, OutputDataConfig={ 'S3Uri': 's3://S3Bucket/output' }, DataAccessRoleArn='arn:aws:iam::account number:role/resource name',
DocumentClassifierArn= 'arn:aws:comprehend:region:account number:document-classifier/SampleCodeClassifier1')
Some other notes
-
To train a custom classifier, identify the classes you want to use for classification. For each class, for more accurate training, we recommend at least 50 documents or more for each class.
-
When training your classifier, the data must be in a single .csv file. The format of the data will depend onwhich classifier mode you choose.
-
After you train the custom classifier, you can analyze documents in either asynchronous (batch) or synchronous operations (real time)
-
Multi-class mode supports up to 1 million examples containing up to 1000 unique classes.
-
Multi-label mode supports up to 1 million examples containing up to 100 unique classes.
Related content:
- ☞ Comprehend custom inference – 2 min read
- ☞ Autogluon training – 2 min read
- ☞ Comprehend custom preprocessing – 3 min read