doc: update readme
This commit is contained in:
parent
5b16e3aa76
commit
04799748ee
@ -2,40 +2,43 @@
|
|||||||
|
|
||||||
Run DIY Robocars model training as Sagemaker (https://aws.amazon.com/fr/sagemaker/) task. Estimated cost for one training (as of August 2018): 0.50 EUR
|
Run DIY Robocars model training as Sagemaker (https://aws.amazon.com/fr/sagemaker/) task. Estimated cost for one training (as of August 2018): 0.50 EUR
|
||||||
|
|
||||||
# Build images
|
## AWS usage
|
||||||
|
|
||||||
|
### Build images
|
||||||
|
|
||||||
- Build model image:
|
- Build model image:
|
||||||
|
|
||||||
```
|
```bash
|
||||||
docker build -t robocars:1.8.0-gpu-py3 -f Dockerfile.gpu .
|
docker build -t robocars:1.8.0-gpu-py3 -f Dockerfile.gpu .
|
||||||
```
|
```
|
||||||
|
|
||||||
# Prepare training (once)
|
### Prepare training (once)
|
||||||
|
|
||||||
- Create a S3 bucket for your tubes. You can use the same for model output or create another bucker for output
|
- Create a S3 bucket for your tubes. You can use the same for model output or create another bucker for output
|
||||||
- Create an AWS docker registry and push your model image to it. Docker hub registry is not supported
|
- Create an AWS docker registry and push your model image to it. Docker hub registry is not supported
|
||||||
|
|
||||||
```
|
```bash
|
||||||
docker tag robocars:1.8.0-gpu-py <replace_me>.dkr.ecr.eu-west-1.amazonaws.com/robocars:1.8.0-gpu-py3
|
docker tag robocars:1.8.0-gpu-py <replace_me>.dkr.ecr.eu-west-1.amazonaws.com/robocars:1.8.0-gpu-py3
|
||||||
# you should have AWS SDK installed and login to docker
|
# you should have AWS SDK installed and login to docker
|
||||||
docker push <replace_me>.dkr.ecr.eu-west-1.amazonaws.com/robocars:1.8.0-gpu-py3
|
docker push <replace_me>.dkr.ecr.eu-west-1.amazonaws.com/robocars:1.8.0-gpu-py3
|
||||||
```
|
```
|
||||||
|
|
||||||
# Run training
|
|
||||||
|
### Run training
|
||||||
|
|
||||||
- Copy your tubes to your S3 bucket. All tubes in the bucket will be used for training so make sure you keep only relevant files. We recommend to zip your tubes before upload. The training package will unzip them.
|
- Copy your tubes to your S3 bucket. All tubes in the bucket will be used for training so make sure you keep only relevant files. We recommend to zip your tubes before upload. The training package will unzip them.
|
||||||
- Create a training job on AWS Sagemaker. Use create_job.sh script after replacing relevant parameters
|
- Create a training job on AWS Sagemaker. Use create_job.sh script after replacing relevant parameters
|
||||||
|
|
||||||
```
|
```bash
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
#usage: create_job.sh some_job_unique_name
|
#usage: create_job.sh some_job_unique_name
|
||||||
job_name=$1
|
job_name=$1
|
||||||
if [ -z $job_name ]
|
if [ -z $job_name ]
|
||||||
then
|
then
|
||||||
echo 'Provide job unique name'
|
echo 'Provide job unique name'
|
||||||
exit 0
|
exit 0
|
||||||
fi
|
fi
|
||||||
echo 'Creating training job '$1
|
echo 'Creating training job '$1
|
||||||
|
|
||||||
aws sagemaker create-training-job \
|
aws sagemaker create-training-job \
|
||||||
@ -51,7 +54,7 @@ aws sagemaker create-training-job \
|
|||||||
|
|
||||||
- Keep an eye on job progression on AWS Sagemaker. Once finished your model is copied into the destination bucket.
|
- Keep an eye on job progression on AWS Sagemaker. Once finished your model is copied into the destination bucket.
|
||||||
|
|
||||||
# About AWS Sagemaker
|
### About AWS Sagemaker
|
||||||
|
|
||||||
Sagemaker provide on-demand model computing and serving. Standard algorithms can be used and on-demande Jupyter notebooks are available. However, as any hosted service, tensorflow versions are updated frequently which is not manageable because compatible versions might not be available on RaspberryPi. Sagemaker also allow "Bring Your Own Algorithm" by using a docker image for training. The resulting container must comply to Sagemaker constraints.
|
Sagemaker provide on-demand model computing and serving. Standard algorithms can be used and on-demande Jupyter notebooks are available. However, as any hosted service, tensorflow versions are updated frequently which is not manageable because compatible versions might not be available on RaspberryPi. Sagemaker also allow "Bring Your Own Algorithm" by using a docker image for training. The resulting container must comply to Sagemaker constraints.
|
||||||
|
|
||||||
@ -59,9 +62,36 @@ Input and output data are mapped to S3 buckets: at container start, input data i
|
|||||||
|
|
||||||
Hyperparameters can be sent at job creation time and accessed by training code (example: ```env.hyperparameters.get('with_slide', False)```)
|
Hyperparameters can be sent at job creation time and accessed by training code (example: ```env.hyperparameters.get('with_slide', False)```)
|
||||||
|
|
||||||
# Which Tensorflow version should I pick ?
|
### Which Tensorflow version should I pick ?
|
||||||
|
|
||||||
Version 1.4.1 model is compatible with 1.8.0 tensorflow runtime
|
Version 1.4.1 model is compatible with 1.8.0 tensorflow runtime
|
||||||
|
|
||||||
Version 1.8.0 model is not compatible with previous tensorflow runtimes
|
Version 1.8.0 model is not compatible with previous tensorflow runtimes
|
||||||
|
|
||||||
|
|
||||||
|
## Local run
|
||||||
|
|
||||||
|
Run training locally with podman
|
||||||
|
|
||||||
|
### Run training with podman
|
||||||
|
|
||||||
|
1. build image
|
||||||
|
|
||||||
|
```bash
|
||||||
|
podman build . -t tensorflow_without_gpu
|
||||||
|
```
|
||||||
|
2. Make archive (See [rc-tools](https://git.cyrilix.bzh/robocars/robocar-tools))
|
||||||
|
|
||||||
|
```bash
|
||||||
|
go run ./cmd/rc-tools training archive -record-path ~/robocar/record-sim2 -output /tmp/train.zip -image-height 120 -image-width 160 --horizon 20 -with-flip-image
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Run training
|
||||||
|
|
||||||
|
```bash
|
||||||
|
podman run --rm -it -v /tmp/data:/opt/ml/input/data/train -v /tmp/output:/opt/ml/model/ localhost/tensorflow_without_gpu python /opt/ml/code/train.py --img_height=100 --img_width=160 --batch_size=32
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
podman run --rm -it -v /tmp/data:/opt/ml/input/data/train -v /tmp/output:/opt/ml/model/ localhost/tensorflow_without_gpu python /opt/ml/code/train.py --img_height=256 --img_width=320 --batch_size=32
|
||||||
|
```
|
Loading…
Reference in New Issue
Block a user