- Use cases
-
1. Preprocessing
- SageMaker Object Detection preprocessing
- Rekognition Object Detection preprocessing
- SageMaker Kmeans preprocessing
- Autopilot preprocessing
- DeepAR preprocessing
- Personalize preprocessing
- Select, drop or extract Columns
- Split dataset to Train and Test
- Upload to s3
- Forecast preprocessing
- Rekognition Classification preprocessing
- SageMaker Image Classification preprocessing
- Xgboost preprocessing
- Blazingtext preprocessing
- Comprehend custom preprocessing
-
2. Training
- SageMaker Object Detection training
- Rekognition Object Detection training
- Forecast training
- Personalize training
- BlazingText training
- DeepAR training
- SageMaker Kmeans training
- Comprehend custom training
- Autopilot Training
- Xgboost Training
- Autogluon training
- Rekognition Classification training
- SageMaker Image Classification training
-
3. Inference
- SageMaker Object Detection inference
- Forecast inference
- Rekognition Object Detection inference
- Comprehend custom inference
- Personalize inference
- Autopilot Inference
- BlazingText Inference
- Custom SageMaker model Inference
- DeepAR Inference
- Rekognition Classification inference
- SageMaker Image Classification inference
- SageMaker Kmeans inference
- Xgboost Inference
- Contribute a use case or contact us for help.
- Frequently Asked Questions
Forecast preprocessing
Prepare Forecast data
While there are many ways to use Forecast, this is the least resistance path. Let's assume you have a dataset with three columns: a timestamp column, a target column (this is the value you want to forecast), and a category column.
2020-01-01 01:00:00, 1.0, CATEGORY_0
2020-01-01 02:00:00, 1.2, CATEGORY_0
2020-01-01 01:00:00, 0.5, CATEGORY_1
2020-01-01 02:00:00, 0.6, CATEGORY_1
The category column is useful when you have multiple related time series. For example, timestamps and target values from multiple product categories, or clients. Replace “CATEGORY_0” with something appropriate to your use case, for example, “CUSTOMER_0” or “product_0”
What do you do when you have no categories? or only 2 columns, one with a timestamp and another with a value?
… Add a third (dummy) column and have the same category in each row (named ‘CATEGORY_0’) like this:
2020-01-01 01:00:00, 1.0, CATEGORY_0
2020-01-01 02:00:00, 1.2, CATEGORY_0
2020-01-01 03:00:00, 1.5, CATEGORY_0
2020-01-01 04:00:00, 1.2, CATEGORY_0
Upload this data to S3, to a location similar to “s3://bucketname/dataset.csv”
Schema definition
Copy this schema definition for future use as is …
{
"Attributes":[
{
"AttributeName": "timestamp",
"AttributeType": "timestamp"
},
{
"AttributeName": "target_value",
"AttributeType": "float"
},
{
"AttributeName": "item_id",
"AttributeType": "string"
}
]
}
Here is how the Forecast workflow look like to make predictions:
Create dataset using the console or CLI or Python
CLI
Create dataset
aws forecast create-dataset \
--dataset-name mydataset \
--domain CUSTOM \
--dataset-type TARGET_TIME_SERIES \
--data-frequency H \
--schema '{
"Attributes": [
{
"AttributeName": "timestamp",
"AttributeType": "timestamp"
},
{
"AttributeName": "target_value",
"AttributeType": "float"
},
{
"AttributeName": "item_id",
"AttributeType": "string"
}
]
}'
Create dataset group
aws forecast create-dataset-group \
--dataset-group-name mydatasetgroup \
--dataset-arns arn:aws:forecast:<region>:acct-id:ds/mydataset \
--domain CUSTOM
Create data import job
aws forecast create-dataset-import-job \
--dataset-arn arn:aws:forecast:<region>:acct-id:dataset/mydataset \
--dataset-import-job-name myimportjob \
--data-source '{
"S3Config": {
"Path": "s3://bucketname/dataset.csv",
"RoleArn": "arn:aws:iam::acct-id:role/Role"
}
}'