DeepAR training

Make sure you saw this link for preprocessing first.

At the end of the preprocessing for DeepAR page, you uploaded your JSON-lines data to S3, to a location similar to s3://bucketname/train/train-data.jsonl and s3://bucketname/test/test-data.jsonl

On a SageMaker notebook, initialize the estimator:

import sagemaker

session = sagemaker.Session()

region = session.boto_region_name

estimator = sagemaker.estimator.Estimator(
    sagemaker_session=session,
    image_name=sagemaker.amazon.amazon_estimator.get_image_uri(region, "forecasting-deepar", "latest"),
    role=sagemaker.get_execution_role(),
    train_instance_count=1,
    train_instance_type='ml.c4.2xlarge',
    base_job_name='deepar-poc',
    output_path='s3://bucket-name/path/to/output')

Assume you have timestamps that are 1 hour apart, and you want to use 10 values in the past to predict 1 value in the future; set hyperparameters as follows:

hyperparameters = {
    "time_freq": '1H',
    "epochs": "400",
    "early_stopping_patience": "40",
    "mini_batch_size": "64",
    "learning_rate": "5E-4",
    "context_length": '10',
    "prediction_length": '1'
}

estimator.set_hyperparameters(**hyperparameters)

Change the ‘1H’ to ‘6H’ for 6 hours, and ‘1D’ for 1 day if your data points are 6 hours or one day apart, for example. Learn more about hyperparameters here

Next, train your DeepAR model using the Sagemaker Python SDK:

data_channels = {
    "train": "s3://bucketname/train/",
    "test": "s3://bucketname/test/"
}

estimator.fit(inputs=data_channels, wait=True)

When adding the path to the file for input data, go up to the folder and not the actual .jsonl file. This is set up so that a train folder for example, may contain multiple .jsonl files.


Related content: