- Use cases
-
1. Preprocessing
- SageMaker Object Detection preprocessing
- Rekognition Object Detection preprocessing
- SageMaker Kmeans preprocessing
- Autopilot preprocessing
- DeepAR preprocessing
- Personalize preprocessing
- Select, drop or extract Columns
- Split dataset to Train and Test
- Upload to s3
- Forecast preprocessing
- Rekognition Classification preprocessing
- SageMaker Image Classification preprocessing
- Xgboost preprocessing
- Blazingtext preprocessing
- Comprehend custom preprocessing
-
2. Training
- SageMaker Object Detection training
- Rekognition Object Detection training
- Forecast training
- Personalize training
- BlazingText training
- DeepAR training
- SageMaker Kmeans training
- Comprehend custom training
- Autopilot Training
- Xgboost Training
- Autogluon training
- Rekognition Classification training
- SageMaker Image Classification training
-
3. Inference
- SageMaker Object Detection inference
- Forecast inference
- Rekognition Object Detection inference
- Comprehend custom inference
- Personalize inference
- Autopilot Inference
- BlazingText Inference
- Custom SageMaker model Inference
- DeepAR Inference
- Rekognition Classification inference
- SageMaker Image Classification inference
- SageMaker Kmeans inference
- Xgboost Inference
- Contribute a use case or contact us for help.
- Frequently Asked Questions
Comprehend custom inference
Make sure you saw this link for training first
After training your model, your custom classifier is ready and can be used to categorize unlabeled documents asynchronously.
Data prep for inference
All documents must be in UTF-8-formatted text files and you can only train your custom classification model using the one document per line format, you can submit your documents in that format or as one document per file
The format of the input file should be as follows:
One document per line
Text of document 1 \n
Text of document 2 \n
Text of document 3 \n
Text of document 4 \n
After preparing the documents file, place that file in the S3 bucket that you're using for input data.
One document per file
Use the URI S3://bucketName/prefix, if the prefix is a single file, Amazon Comprehend uses that file as input. If more than one file begins with the prefix, Amazon Comprehend uses all of them as input.
Prediction
In order to launch a new job, execute the following replacing with your bucket locations and classifier arns
aws comprehend start-document-classification-job --document-classifier-arn <<your-comprehendclassifier-arn>> --input-data-config S3Uri=<<YOUR_S3_INPUTBUCKET>>,InputFormat=ONE_DOC_PER_LINE --output-data-config S3Uri=<<YOUR_S3_OUTPUTBUCKET>> --data-access-role-arn <<YOUR_IAM_ROLE_ARN>>
You should see something like this:
{
"DocumentClassificationJobProperties": {
"JobId": "4*********************8aab",
"JobStatus": "IN_PROGRESS",
"SubmitTime": 1561679679.036,
"DocumentClassifierArn": "YourClassifierArn",
"InputDataConfig": {
"S3Uri": "YourS3Uri",
"InputFormat": "ONE_DOC_PER_LINE"
},
"OutputDataConfig": {
"S3Uri": "S3OutputLocation"
},
"DataAccessRoleArn": "YourAccessRole"
}
}
To check the newly launched job:
aws comprehend describe-document-classification-job --job-id <<PROVIDE_YOUR_JOB_ID>>
Then you can download the results using OutputDataConfig.S3Uri
To implement it using console, see this link
To create a model-specific endpoint for synchronous inference for a previously trained custom model using Python:
response = client.create_endpoint(
EndpointName='string',
ModelArn='string',
DesiredInferenceUnits=123,
ClientRequestToken='string',
Tags=[
{
'Key': 'string',
'Value': 'string'
},
]
)
Related content:
- ☞ Comprehend custom training – 1 min read
- ☞ Autogluon training – 2 min read
- ☞ Comprehend custom preprocessing – 3 min read