Compare commits
11 Commits
Author | SHA1 | Date | |
---|---|---|---|
ad3eb16333 | |||
04799748ee | |||
5b16e3aa76 | |||
fc1945cecd | |||
b8e011e7cd | |||
3a376dd5a3 | |||
2076b4491a | |||
37bb0fff2d | |||
a5354e5653 | |||
84a8b11942 | |||
9ec80414c9 |
2
.dockerignore
Normal file
2
.dockerignore
Normal file
@ -0,0 +1,2 @@
|
||||
venv
|
||||
|
2
.gitignore
vendored
Normal file
2
.gitignore
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
venv
|
||||
/src/robocars_sagemaker_container.egg-info/
|
15
Dockerfile
Normal file
15
Dockerfile
Normal file
@ -0,0 +1,15 @@
|
||||
FROM docker.io/tensorflow/tensorflow:2.6.0
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip3 install --upgrade pip==20.0.2 && pip3 list && pip3 install -r requirements.txt \
|
||||
&& pip3 list
|
||||
|
||||
WORKDIR /root
|
||||
|
||||
# copy the training script inside the container
|
||||
COPY src/tf_container/train.py /opt/ml/code/train.py
|
||||
|
||||
# define train.py as the script entry point
|
||||
ENV SAGEMAKER_PROGRAM train.py
|
||||
|
||||
|
@ -1,36 +1,15 @@
|
||||
FROM python:3.5 as builder
|
||||
FROM docker.io/tensorflow/tensorflow:2.6.0-gpu
|
||||
|
||||
RUN mkdir -p /usr/src
|
||||
ADD . /usr/src
|
||||
WORKDIR /usr/src
|
||||
|
||||
RUN python3 setup.py sdist
|
||||
|
||||
FROM tensorflow/tensorflow:1.8.0-gpu-py3
|
||||
|
||||
#tensorflow-serving-api-python3==1.7.0
|
||||
RUN pip3 list && pip3 install numpy boto3 six awscli flask==0.11 Jinja2==2.9 gevent gunicorn keras==2.1.3 pillow h5py \
|
||||
COPY requirements.txt .
|
||||
RUN pip3 install --upgrade pip==20.0.2 && pip3 list && pip3 install -r requirements.txt \
|
||||
&& pip3 list
|
||||
|
||||
WORKDIR /root
|
||||
|
||||
RUN apt-get -y update && \
|
||||
apt-get -y install curl && \
|
||||
apt-get -y install vim && \
|
||||
apt-get -y install iputils-ping && \
|
||||
apt-get -y install nginx
|
||||
# copy the training script inside the container
|
||||
COPY src/tf_container/train.py /opt/ml/code/train.py
|
||||
|
||||
# install telegraf
|
||||
RUN cd /tmp && \
|
||||
curl -O https://dl.influxdata.com/telegraf/releases/telegraf_1.4.2-1_amd64.deb && \
|
||||
dpkg -i telegraf_1.4.2-1_amd64.deb && \
|
||||
cd -
|
||||
# define train.py as the script entry point
|
||||
ENV SAGEMAKER_PROGRAM train.py
|
||||
|
||||
COPY --from=builder /usr/src/dist/robocars_sagemaker_container-1.0.0.tar.gz .
|
||||
|
||||
RUN pip3 install robocars_sagemaker_container-1.0.0.tar.gz
|
||||
|
||||
RUN rm robocars_sagemaker_container-1.0.0.tar.gz
|
||||
|
||||
ENTRYPOINT ["entry.py"]
|
||||
|
||||
|
@ -2,31 +2,34 @@
|
||||
|
||||
Run DIY Robocars model training as Sagemaker (https://aws.amazon.com/fr/sagemaker/) task. Estimated cost for one training (as of August 2018): 0.50 EUR
|
||||
|
||||
# Build images
|
||||
## AWS usage
|
||||
|
||||
### Build images
|
||||
|
||||
- Build model image:
|
||||
|
||||
```
|
||||
```bash
|
||||
docker build -t robocars:1.8.0-gpu-py3 -f Dockerfile.gpu .
|
||||
```
|
||||
|
||||
# Prepare training (once)
|
||||
### Prepare training (once)
|
||||
|
||||
- Create a S3 bucket for your tubes. You can use the same for model output or create another bucker for output
|
||||
- Create an AWS docker registry and push your model image to it. Docker hub registry is not supported
|
||||
|
||||
```
|
||||
```bash
|
||||
docker tag robocars:1.8.0-gpu-py <replace_me>.dkr.ecr.eu-west-1.amazonaws.com/robocars:1.8.0-gpu-py3
|
||||
# you should have AWS SDK installed and login to docker
|
||||
docker push <replace_me>.dkr.ecr.eu-west-1.amazonaws.com/robocars:1.8.0-gpu-py3
|
||||
```
|
||||
|
||||
# Run training
|
||||
|
||||
### Run training
|
||||
|
||||
- Copy your tubes to your S3 bucket. All tubes in the bucket will be used for training so make sure you keep only relevant files. We recommend to zip your tubes before upload. The training package will unzip them.
|
||||
- Create a training job on AWS Sagemaker. Use create_job.sh script after replacing relevant parameters
|
||||
|
||||
```
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
#usage: create_job.sh some_job_unique_name
|
||||
@ -51,7 +54,7 @@ aws sagemaker create-training-job \
|
||||
|
||||
- Keep an eye on job progression on AWS Sagemaker. Once finished your model is copied into the destination bucket.
|
||||
|
||||
# About AWS Sagemaker
|
||||
### About AWS Sagemaker
|
||||
|
||||
Sagemaker provide on-demand model computing and serving. Standard algorithms can be used and on-demande Jupyter notebooks are available. However, as any hosted service, tensorflow versions are updated frequently which is not manageable because compatible versions might not be available on RaspberryPi. Sagemaker also allow "Bring Your Own Algorithm" by using a docker image for training. The resulting container must comply to Sagemaker constraints.
|
||||
|
||||
@ -59,9 +62,36 @@ Input and output data are mapped to S3 buckets: at container start, input data i
|
||||
|
||||
Hyperparameters can be sent at job creation time and accessed by training code (example: ```env.hyperparameters.get('with_slide', False)```)
|
||||
|
||||
# Which Tensorflow version should I pick ?
|
||||
### Which Tensorflow version should I pick ?
|
||||
|
||||
Version 1.4.1 model is compatible with 1.8.0 tensorflow runtime
|
||||
|
||||
Version 1.8.0 model is not compatible with previous tensorflow runtimes
|
||||
|
||||
|
||||
## Local run
|
||||
|
||||
Run training locally with podman
|
||||
|
||||
### Run training with podman
|
||||
|
||||
1. build image
|
||||
|
||||
```bash
|
||||
podman build . -t tensorflow_without_gpu
|
||||
```
|
||||
2. Make archive (See [rc-tools](https://git.cyrilix.bzh/robocars/robocar-tools))
|
||||
|
||||
```bash
|
||||
go run ./cmd/rc-tools training archive -record-path ~/robocar/record-sim2 -output /tmp/train.zip -image-height 120 -image-width 160 --horizon 20 -with-flip-image
|
||||
```
|
||||
|
||||
3. Run training
|
||||
|
||||
```bash
|
||||
podman run --rm -it -v /tmp/data:/opt/ml/input/data/train -v /tmp/output:/opt/ml/model/ localhost/tensorflow_without_gpu python /opt/ml/code/train.py --img_height=100 --img_width=160 --batch_size=32
|
||||
```
|
||||
|
||||
```bash
|
||||
podman run --rm -it -v /tmp/data:/opt/ml/input/data/train -v /tmp/output:/opt/ml/model/ localhost/tensorflow_without_gpu python /opt/ml/code/train.py --img_height=256 --img_width=320 --batch_size=32
|
||||
```
|
@ -1,22 +1,24 @@
|
||||
#!/bin/bash
|
||||
|
||||
job_name=$1
|
||||
if [ -z $job_name ]
|
||||
if [[ -z ${job_name} ]]
|
||||
then
|
||||
echo 'Provide model name'
|
||||
exit 0
|
||||
fi
|
||||
echo 'Creating training job '$1
|
||||
|
||||
training_image="<replace_me>.dkr.ecr.eu-west-1.amazonaws.com/robocars:1.8.0-gpu-py3"
|
||||
iam_role_arn="arn:aws:iam::<replace_me>:role/service-role/<replace_me>"
|
||||
training_image="117617958416.dkr.ecr.eu-west-1.amazonaws.com/robocars:tensorflow"
|
||||
iam_role_arn="arn:aws:iam::117617958416:role/robocar-training"
|
||||
DATA_BUCKET="s3://robocars-cyrilix-learning/input"
|
||||
DATA_OUTPUT="s3://robocars-cyrilix-learning/output"
|
||||
|
||||
aws sagemaker create-training-job \
|
||||
--training-job-name $job_name \
|
||||
--hyper-parameters '{ "sagemaker_region": "\"eu-west-1\"", "with_slide": "true" }' \
|
||||
--algorithm-specification TrainingImage=$training_image,TrainingInputMode=File \
|
||||
--role-arn $iam_role_arn \
|
||||
--input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<replace_me>", "S3DataDistributionType": "FullyReplicated" }} }]' \
|
||||
--output-data-config S3OutputPath=s3://<replace_me> \
|
||||
--training-job-name ${job_name} \
|
||||
--hyper-parameters '{ "sagemaker_region": "\"eu-west-1\"", "with_slide": "true", "img_height": "120", "img_width": "160" }' \
|
||||
--algorithm-specification TrainingImage="${training_image}",TrainingInputMode=File \
|
||||
--role-arn ${iam_role_arn} \
|
||||
--input-data-config "[{ \"ChannelName\": \"train\", \"DataSource\": { \"S3DataSource\": { \"S3DataType\": \"S3Prefix\", \"S3Uri\": \"${DATA_BUCKET}\", \"S3DataDistributionType\": \"FullyReplicated\" }} }]" \
|
||||
--output-data-config S3OutputPath=${DATA_OUTPUT} \
|
||||
--resource-config InstanceType=ml.p2.xlarge,InstanceCount=1,VolumeSizeInGB=1 \
|
||||
--stopping-condition MaxRuntimeInSeconds=1800
|
||||
|
4
requirements.txt
Normal file
4
requirements.txt
Normal file
@ -0,0 +1,4 @@
|
||||
sagemaker-training==3.9.2
|
||||
tensorflow==2.6.0
|
||||
numpy==1.19.5
|
||||
pillow==8.3.2
|
10
setup.py
10
setup.py
@ -1,8 +1,8 @@
|
||||
import os
|
||||
from glob import glob
|
||||
from os.path import basename
|
||||
from os.path import splitext
|
||||
|
||||
from glob import glob
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
|
||||
@ -19,9 +19,13 @@ setup(
|
||||
py_modules=[splitext(basename(path))[0] for path in glob('src/*.py')],
|
||||
|
||||
classifiers=[
|
||||
'Programming Language :: Python :: 3.5',
|
||||
'Programming Language :: Python :: 3.7',
|
||||
],
|
||||
|
||||
entry_points={
|
||||
'console_scripts': [
|
||||
'train=tf_container.train_entry_point:train',
|
||||
]
|
||||
},
|
||||
install_requires=['sagemaker-container-support'],
|
||||
extras_require={},
|
||||
)
|
||||
|
292
src/tf_container/train.py
Normal file
292
src/tf_container/train.py
Normal file
@ -0,0 +1,292 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
|
||||
# import container_support as cs
|
||||
import argparse
|
||||
import json
|
||||
|
||||
import numpy as np
|
||||
import re
|
||||
import tensorflow as tf
|
||||
import zipfile
|
||||
# from tensorflow.keras import backend as K
|
||||
from tensorflow.keras import callbacks
|
||||
from tensorflow.keras.layers import Convolution2D
|
||||
from tensorflow.keras.layers import Dropout, Flatten, Dense
|
||||
from tensorflow.keras.layers import Input
|
||||
from tensorflow.keras.models import Model
|
||||
from tensorflow.keras.preprocessing.image import load_img, img_to_array
|
||||
from tensorflow.python.client import device_lib
|
||||
|
||||
|
||||
def linear_bin(a: float, N: int = 15, offset: int = 1, R: float = 2.0):
|
||||
"""
|
||||
create a bin of length N
|
||||
map val A to range R
|
||||
offset one hot bin by offset, commonly R/2
|
||||
"""
|
||||
a = a + offset
|
||||
b = round(a / (R / (N - offset)))
|
||||
arr = np.zeros(N)
|
||||
b = clamp(b, 0, N - 1)
|
||||
arr[int(b)] = 1
|
||||
return arr
|
||||
|
||||
|
||||
def clamp(n, min, max):
|
||||
if n <= min:
|
||||
return min
|
||||
if n >= max:
|
||||
return max
|
||||
return n
|
||||
|
||||
|
||||
def get_data(root_dir, filename):
|
||||
print('load data from file ' + filename)
|
||||
d = json.load(open(os.path.join(root_dir, filename)))
|
||||
return [(d['user/angle']), root_dir, d['cam/image_array']]
|
||||
|
||||
|
||||
numbers = re.compile(r'(\d+)')
|
||||
|
||||
|
||||
def unzip_file(root, f):
|
||||
zip_ref = zipfile.ZipFile(os.path.join(root, f), 'r')
|
||||
zip_ref.extractall(root)
|
||||
zip_ref.close()
|
||||
|
||||
|
||||
def train(batch_size: int, slide_size: int, img_height: int, img_width: int, img_depth: int, horizon: int, drop: float):
|
||||
# env = cs.TrainingEnvironment()
|
||||
|
||||
print(device_lib.list_local_devices())
|
||||
os.system('mkdir -p logs')
|
||||
|
||||
# ### Loading the files ###
|
||||
# ** You need to copy all your files to the directory where you are runing this notebook **
|
||||
# ** into a folder named "data" **
|
||||
|
||||
data = []
|
||||
|
||||
for root, dirs, files in os.walk('/opt/ml/input/data/train'):
|
||||
for f in files:
|
||||
if f.endswith('.zip'):
|
||||
unzip_file(root, f)
|
||||
|
||||
for root, dirs, files in os.walk('/opt/ml/input/data/train'):
|
||||
data.extend(
|
||||
[get_data(root, f) for f in sorted(files, key=str.lower) if f.startswith('record') and f.endswith('.json')])
|
||||
|
||||
# ### Loading throttle and angle ###
|
||||
|
||||
angle = [d[0] for d in data]
|
||||
angle_array = np.array(angle)
|
||||
|
||||
# ### Loading images ###
|
||||
if horizon > 0:
|
||||
images = np.array([img_to_array(load_img(os.path.join(d[1], d[2])).crop((0, horizon, img_width, img_height))) for d in data], 'f')
|
||||
else:
|
||||
images = np.array( [img_to_array(load_img(os.path.join(d[1], d[2]))) for d in data], 'f')
|
||||
|
||||
# slide images vs orders
|
||||
if slide_size > 0:
|
||||
images = images[:len(images) - slide_size]
|
||||
angle_array = angle_array[slide_size:]
|
||||
|
||||
# ### Start training ###
|
||||
from datetime import datetime
|
||||
logdir = '/opt/ml/model/logs/' + datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
logs = callbacks.TensorBoard(log_dir=logdir, histogram_freq=0, write_graph=True, write_images=True)
|
||||
|
||||
# Creates a file writer for the log directory.
|
||||
# file_writer = tf.summary.create_file_writer(logdir)
|
||||
|
||||
# Using the file writer, log the reshaped image.
|
||||
# with file_writer.as_default():
|
||||
# # Don't forget to reshape.
|
||||
# imgs = np.reshape(images[0:25], (-1, img_height, img_width, img_depth))
|
||||
# tf.summary.image("25 training data examples", imgs, max_outputs=25, step=0)
|
||||
|
||||
save_best = callbacks.ModelCheckpoint('/opt/ml/model/model_cat', monitor='val_loss', verbose=1,
|
||||
save_best_only=True, mode='min')
|
||||
early_stop = callbacks.EarlyStopping(monitor='val_loss',
|
||||
min_delta=.0005,
|
||||
patience=5,
|
||||
verbose=1,
|
||||
mode='auto')
|
||||
|
||||
# categorical output of the angle
|
||||
callbacks_list = [save_best, early_stop, logs]
|
||||
|
||||
angle_cat_array = np.array([linear_bin(float(a)) for a in angle_array])
|
||||
|
||||
model = default_model(input_shape=(img_height - horizon, img_width, img_depth), drop=drop)
|
||||
#model = default_categorical(input_shape=(img_height - horizon, img_width, img_depth), drop=drop)
|
||||
|
||||
model.compile(optimizer='adam',
|
||||
loss={'angle_out': 'categorical_crossentropy', },
|
||||
loss_weights={'angle_out': 0.9})
|
||||
model.fit({'img_in': images}, {'angle_out': angle_cat_array, }, batch_size=batch_size,
|
||||
epochs=100, verbose=1, validation_split=0.2, shuffle=True, callbacks=callbacks_list)
|
||||
|
||||
# Save model for tensorflow using
|
||||
model.save("/opt/ml/model/tfModel", save_format="tf")
|
||||
|
||||
def representative_dataset():
|
||||
for d in tf.data.Dataset.from_tensor_slices(images).batch(1).take(100):
|
||||
yield [tf.dtypes.cast(d, tf.float32)]
|
||||
|
||||
converter = tf.lite.TFLiteConverter.from_keras_model(model)
|
||||
|
||||
# full quantization for edgeTpu
|
||||
# https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization
|
||||
converter.optimizations = [tf.lite.Optimize.DEFAULT]
|
||||
converter.representative_dataset = representative_dataset
|
||||
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
|
||||
converter.inference_input_type = tf.uint8 # or tf.int8
|
||||
converter.inference_output_type = tf.uint8 # or tf.int8
|
||||
|
||||
tflite_model = converter.convert()
|
||||
|
||||
# Save the model.
|
||||
with open('/opt/ml/model/model_' + str(img_width) + 'x' + str(img_height) + 'h' + str(horizon) + '.tflite',
|
||||
'wb') as f:
|
||||
f.write(tflite_model)
|
||||
|
||||
|
||||
def conv2d(filters, kernel, strides, layer_num, activation='relu'):
|
||||
"""
|
||||
Helper function to create a standard valid-padded convolutional layer
|
||||
with square kernel and strides and unified naming convention
|
||||
:param filters: channel dimension of the layer
|
||||
:param kernel: creates (kernel, kernel) kernel matrix dimension
|
||||
:param strides: creates (strides, strides) stride
|
||||
:param layer_num: used in labelling the layer
|
||||
:param activation: activation, defaults to relu
|
||||
:return: tf.keras Convolution2D layer
|
||||
"""
|
||||
return Convolution2D(filters=filters,
|
||||
kernel_size=(kernel, kernel),
|
||||
strides=(strides, strides),
|
||||
activation=activation,
|
||||
name='conv2d_' + str(layer_num))
|
||||
|
||||
|
||||
def core_cnn_layers(img_in: Input, img_height: int, img_width: int, drop: float, l4_stride: int = 1):
|
||||
"""
|
||||
Returns the core CNN layers that are shared among the different models,
|
||||
like linear, imu, behavioural
|
||||
:param img_in: input layer of network
|
||||
:param drop: dropout rate
|
||||
:param l4_stride: 4-th layer stride, default 1
|
||||
:return: stack of CNN layers
|
||||
"""
|
||||
x = img_in
|
||||
x = conv2d(img_height/5, 5, 2, 1)(x)
|
||||
x = Dropout(drop)(x)
|
||||
x = conv2d(img_width / 5, 5, 2, 2)(x)
|
||||
x = Dropout(drop)(x)
|
||||
x = conv2d(64, 5, 2, 3)(x)
|
||||
x = Dropout(drop)(x)
|
||||
x = conv2d(64, 3, l4_stride, 4)(x)
|
||||
x = Dropout(drop)(x)
|
||||
x = conv2d(64, 3, 1, 5)(x)
|
||||
x = Dropout(drop)(x)
|
||||
x = Flatten(name='flattened')(x)
|
||||
return x
|
||||
|
||||
|
||||
def default_model(input_shape, drop):
|
||||
# First layer, input layer, Shape comes from camera.py resolution, RGB
|
||||
img_in = Input(shape=input_shape, name='img_in')
|
||||
kernel_size = 5
|
||||
|
||||
x = img_in
|
||||
# 24 features, 5 pixel x 5 pixel kernel (convolution, feauture) window, 2wx2h stride, relu activation
|
||||
x = Convolution2D(input_shape[1] / kernel_size, (kernel_size, kernel_size), strides=(2, 2), activation='relu')(x)
|
||||
x = Dropout(drop)(x)
|
||||
# 32 features, 5px5p kernel window, 2wx2h stride, relu activatiion
|
||||
x = Convolution2D(input_shape[0] / kernel_size, (kernel_size, kernel_size), strides=(2, 2), activation='relu')(x)
|
||||
x = Dropout(drop)(x)
|
||||
# 64 features, 5px5p kernel window, 2wx2h stride, relu
|
||||
x = Convolution2D(64, (kernel_size, kernel_size), strides=(2, 2), activation='relu')(x)
|
||||
x = Dropout(drop)(x)
|
||||
# 64 features, 3px3p kernel window, 2wx2h stride, relu
|
||||
x = Convolution2D(64, (3, 3), strides=(2, 2), activation='relu')(x)
|
||||
x = Dropout(drop)(x)
|
||||
# 64 features, 3px3p kernel window, 1wx1h stride, relu
|
||||
x = Convolution2D(64, (3, 3), strides=(1, 1), activation='relu')(x)
|
||||
x = Dropout(drop)(x)
|
||||
|
||||
# Possibly add MaxPooling (will make it less sensitive to position in image).
|
||||
# Camera angle fixed, so may not to be needed
|
||||
|
||||
x = Flatten(name='flattened')(x) # Flatten to 1D (Fully connected)
|
||||
x = Dense(100, activation='relu')(x) # Classify the data into 100 features, make all negatives 0
|
||||
x = Dropout(drop)(x)
|
||||
x = Dense(50, activation='relu')(x)
|
||||
x = Dropout(drop)(x)
|
||||
# Connect every input with every output and output 15 hidden units. Use Softmax to give percentage.
|
||||
# 15 categories and find best one based off percentage 0.0-1.0
|
||||
angle_out = Dense(15, activation='softmax', name='angle_out')(x)
|
||||
|
||||
model = Model(inputs=[img_in], outputs=[angle_out])
|
||||
|
||||
return model
|
||||
|
||||
|
||||
def default_n_linear(num_outputs, input_shape=(120, 160, 3), drop=0.2):
|
||||
img_in = Input(shape=input_shape, name='img_in')
|
||||
x = core_cnn_layers(img_in, img_width=input_shape[1], img_height=input_shape[0], drop=drop)
|
||||
x = Dense(100, activation='relu', name='dense_1')(x)
|
||||
x = Dropout(drop)(x)
|
||||
x = Dense(50, activation='relu', name='dense_2')(x)
|
||||
x = Dropout(drop)(x)
|
||||
|
||||
outputs = []
|
||||
for i in range(num_outputs):
|
||||
outputs.append(
|
||||
Dense(1, activation='linear', name='n_outputs' + str(i))(x))
|
||||
|
||||
model = Model(inputs=[img_in], outputs=outputs, name='linear')
|
||||
return model
|
||||
|
||||
|
||||
def default_categorical(input_shape=(120, 160, 3), drop=0.2):
|
||||
img_in = Input(shape=input_shape, name='img_in')
|
||||
x = core_cnn_layers(img_in, img_width=input_shape[1], img_height=input_shape[0], drop=drop, l4_stride=2)
|
||||
x = Dense(100, activation='relu', name="dense_1")(x)
|
||||
x = Dropout(drop)(x)
|
||||
x = Dense(50, activation='relu', name="dense_2")(x)
|
||||
x = Dropout(drop)(x)
|
||||
# Categorical output of the angle into 15 bins
|
||||
angle_out = Dense(15, activation='softmax', name='angle_out')(x)
|
||||
|
||||
model = Model(inputs=[img_in], outputs=[angle_out],
|
||||
name='categorical')
|
||||
return model
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
parser.add_argument("--slide_size", type=int, default=0)
|
||||
parser.add_argument("--img_height", type=int, default=120)
|
||||
parser.add_argument("--img_width", type=int, default=160)
|
||||
parser.add_argument("--img_depth", type=int, default=3)
|
||||
parser.add_argument("--horizon", type=int, default=0)
|
||||
parser.add_argument("--batch_size", type=int, default=32)
|
||||
parser.add_argument("--drop", type=float, default=0.2)
|
||||
|
||||
args = parser.parse_args()
|
||||
params = vars(args)
|
||||
train(
|
||||
batch_size=params["batch_size"],
|
||||
slide_size=params["slide_size"],
|
||||
img_height=params["img_height"],
|
||||
img_width=params["img_width"],
|
||||
img_depth=params["img_depth"],
|
||||
horizon=params["horizon"],
|
||||
drop=params["drop"],
|
||||
)
|
@ -1,126 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import container_support as cs
|
||||
|
||||
import os
|
||||
import json
|
||||
import re
|
||||
import zipfile
|
||||
from keras.preprocessing.image import load_img, img_to_array
|
||||
import numpy as np
|
||||
|
||||
from keras.layers import Input, Dense, merge
|
||||
from keras.models import Model
|
||||
from keras.layers import Convolution2D, MaxPooling2D, Reshape, BatchNormalization
|
||||
from keras.layers import Activation, Dropout, Flatten, Dense
|
||||
from keras import callbacks
|
||||
from tensorflow.python.client import device_lib
|
||||
|
||||
def train():
|
||||
env = cs.TrainingEnvironment()
|
||||
|
||||
print(device_lib.list_local_devices())
|
||||
os.system('mkdir -p logs')
|
||||
|
||||
# ### Loading the files ###
|
||||
# ** You need to copy all your files to the directory where you are runing this notebook into a folder named "data" **
|
||||
|
||||
numbers = re.compile(r'(\d+)')
|
||||
data = []
|
||||
def get_data(root,f):
|
||||
d = json.load(open(os.path.join(root,f)))
|
||||
if ('pilot/throttle' in d):
|
||||
return [d['user/mode'],d['user/throttle'],d['user/angle'],root,d['cam/image_array'],d['pilot/throttle'],d['pilot/angle']]
|
||||
else:
|
||||
return [d['user/mode'],d['user/throttle'],d['user/angle'],root,d['cam/image_array']]
|
||||
def numericalSort(value):
|
||||
parts = numbers.split(value)
|
||||
parts[1::2] = map(int, parts[1::2])
|
||||
return parts
|
||||
def unzip_file(root,f):
|
||||
zip_ref = zipfile.ZipFile(os.path.join(root,f), 'r')
|
||||
zip_ref.extractall(root)
|
||||
zip_ref.close()
|
||||
|
||||
for root, dirs, files in os.walk('/opt/ml/input/data/train'):
|
||||
for f in files:
|
||||
if f.endswith('.zip'):
|
||||
unzip_file(root, f)
|
||||
|
||||
for root, dirs, files in os.walk('/opt/ml/input/data/train'):
|
||||
data.extend([get_data(root,f) for f in sorted(files, key=numericalSort) if f.startswith('record') and f.endswith('.json')])
|
||||
|
||||
# Normalize / correct data
|
||||
data = [d for d in data if d[1] > 0.1]
|
||||
for d in data:
|
||||
if d[1] < 0.2:
|
||||
d[1] = 0.2
|
||||
|
||||
# ### Loading throttle and angle ###
|
||||
|
||||
angle = [d[2] for d in data]
|
||||
throttle = [d[1] for d in data]
|
||||
angle_array = np.array(angle)
|
||||
throttle_array = np.array(throttle)
|
||||
if (len(data[0]) > 5):
|
||||
pilot_angle = [d[6] for d in data]
|
||||
pilot_throttle = [d[5] for d in data]
|
||||
pilot_angle_array = np.array(pilot_angle)
|
||||
pilot_throttle_array = np.array(pilot_throttle)
|
||||
else:
|
||||
pilot_angle = []
|
||||
pilot_throttle = []
|
||||
|
||||
|
||||
# ### Loading images ###
|
||||
images = np.array([img_to_array(load_img(os.path.join(d[3],d[4]))) for d in data],'f')
|
||||
|
||||
# slide images vs orders
|
||||
if env.hyperparameters.get('with_slide', False):
|
||||
images = images[:len(images)-2]
|
||||
angle_array = angle_array[2:]
|
||||
throttle_array = throttle_array[2:]
|
||||
|
||||
# ### Start training ###
|
||||
def linear_bin(a):
|
||||
a = a + 1
|
||||
b = round(a / (2/14))
|
||||
arr = np.zeros(15)
|
||||
arr[int(b)] = 1
|
||||
return arr
|
||||
|
||||
logs = callbacks.TensorBoard(log_dir='logs', histogram_freq=0, write_graph=True, write_images=True)
|
||||
save_best = callbacks.ModelCheckpoint('/opt/ml/model/model_cat', monitor='angle_out_loss', verbose=1, save_best_only=True, mode='min')
|
||||
early_stop = callbacks.EarlyStopping(monitor='angle_out_loss',
|
||||
min_delta=.0005,
|
||||
patience=10,
|
||||
verbose=1,
|
||||
mode='auto')
|
||||
img_in = Input(shape=(120, 160, 3), name='img_in') # First layer, input layer, Shape comes from camera.py resolution, RGB
|
||||
x = img_in
|
||||
x = Convolution2D(24, (5,5), strides=(2,2), activation='relu')(x) # 24 features, 5 pixel x 5 pixel kernel (convolution, feauture) window, 2wx2h stride, relu activation
|
||||
x = Convolution2D(32, (5,5), strides=(2,2), activation='relu')(x) # 32 features, 5px5p kernel window, 2wx2h stride, relu activatiion
|
||||
x = Convolution2D(64, (5,5), strides=(2,2), activation='relu')(x) # 64 features, 5px5p kernal window, 2wx2h stride, relu
|
||||
x = Convolution2D(64, (3,3), strides=(2,2), activation='relu')(x) # 64 features, 3px3p kernal window, 2wx2h stride, relu
|
||||
x = Convolution2D(64, (3,3), strides=(1,1), activation='relu')(x) # 64 features, 3px3p kernal window, 1wx1h stride, relu
|
||||
|
||||
# Possibly add MaxPooling (will make it less sensitive to position in image). Camera angle fixed, so may not to be needed
|
||||
|
||||
x = Flatten(name='flattened')(x) # Flatten to 1D (Fully connected)
|
||||
x = Dense(100, activation='relu')(x) # Classify the data into 100 features, make all negatives 0
|
||||
x = Dropout(.1)(x)
|
||||
x = Dense(50, activation='relu')(x)
|
||||
x = Dropout(.1)(x) # Randomly drop out 10% of the neurons (Prevent overfitting)
|
||||
#categorical output of the angle
|
||||
callbacks_list = [save_best, early_stop, logs]
|
||||
angle_out = Dense(15, activation='softmax', name='angle_out')(x) # Connect every input with every output and output 15 hidden units. Use Softmax to give percentage. 15 categories and find best one based off percentage 0.0-1.0
|
||||
|
||||
#continous output of throttle
|
||||
throttle_out = Dense(1, activation='relu', name='throttle_out')(x) # Reduce to 1 number, Positive number only
|
||||
angle_cat_array = np.array([linear_bin(a) for a in angle_array])
|
||||
model = Model(inputs=[img_in], outputs=[angle_out, throttle_out])
|
||||
model.compile(optimizer='adam',
|
||||
loss={'angle_out': 'categorical_crossentropy',
|
||||
'throttle_out': 'mean_absolute_error'},
|
||||
loss_weights={'angle_out': 0.9, 'throttle_out': .001})
|
||||
model.fit({'img_in':images},{'angle_out': angle_cat_array, 'throttle_out': throttle_array}, batch_size=32, epochs=100, verbose=1, validation_split=0.2, shuffle=True, callbacks=callbacks_list)
|
Reference in New Issue
Block a user