- Use cases
-
1. Preprocessing
- SageMaker Object Detection preprocessing
- Rekognition Object Detection preprocessing
- SageMaker Kmeans preprocessing
- Autopilot preprocessing
- DeepAR preprocessing
- Personalize preprocessing
- Select, drop or extract Columns
- Split dataset to Train and Test
- Upload to s3
- Forecast preprocessing
- Rekognition Classification preprocessing
- SageMaker Image Classification preprocessing
- Xgboost preprocessing
- Blazingtext preprocessing
- Comprehend custom preprocessing
-
2. Training
- SageMaker Object Detection training
- Rekognition Object Detection training
- Forecast training
- Personalize training
- BlazingText training
- DeepAR training
- SageMaker Kmeans training
- Comprehend custom training
- Autopilot Training
- Xgboost Training
- Autogluon training
- Rekognition Classification training
- SageMaker Image Classification training
-
3. Inference
- SageMaker Object Detection inference
- Forecast inference
- Rekognition Object Detection inference
- Comprehend custom inference
- Personalize inference
- Autopilot Inference
- BlazingText Inference
- Custom SageMaker model Inference
- DeepAR Inference
- Rekognition Classification inference
- SageMaker Image Classification inference
- SageMaker Kmeans inference
- Xgboost Inference
- Contribute a use case or contact us for help.
- Frequently Asked Questions
BlazingText training
Make sure you saw this link for preprocessing first.
At the end of the preprocessing for BlazingText page, you converted your CSV file into a format that BlazingText accepts, and uploaded the file to s3://bucketname/train/out.csv
.
On a SageMaker notebook, initialize the estimator:
import sagemaker
session = sagemaker.Session()
region = session.boto_region_name
estimator = sagemaker.estimator.Estimator(
sagemaker_session=session,
image_name=sagemaker.amazon.amazon_estimator.get_image_uri(region, "blazingtext", "latest"),
role=sagemaker.get_execution_role(),
train_instance_count=1,
train_instance_type='ml.c4.2xlarge',
base_job_name='blazingtext-poc',
train_volume_size = 30,
train_max_run = 360000,
input_mode= 'File',
output_path='s3://bucket-name/path/to/output')
Assume you have timestamps that are 1 hour apart, and you want to use 10 values in the past to predict 1 value in the future; set hyperparameters as follows:
estimator.set_hyperparameters(mode="supervised",
epochs=10,
min_count=2,
learning_rate=0.05,
vector_dim=10,
early_stopping=True,
patience=4,
min_epochs=5,
word_ngrams=2)
Learn more about hyperparameters here
Next, train your BlazingText model using the Sagemaker Python SDK:
Here, we assume that you have a folder with (one or more) train files and test files. Make sure you saw this link to help you split the input files.
data_channels = {
"train": "s3://bucketname/train/",
"validation": "s3://bucketname/test/"
}
estimator.fit(inputs=data_channels, wait=True, logs=True)
When adding the path to the file for input data, go up to the folder and not the actual .csv file. This is set up so that a train folder for example, may contain multiple .csv files.
Related content:
- ☞ Autogluon training – 2 min read
- ☞ BlazingText Inference – 2 min read
- ☞ Blazingtext preprocessing – 1 min read