Randomized Hyperparameter Search in Tensorflow¶

3rd Round¶

In the first two rounds of hyperparameter searches, the stride of the first layer was fixed at 2. This was a necessity given the time required to train a network with a stride of 1 on a CPU (Google Compute Engine, here I come!). In this 3rd round I reduced the stride from 2 to 1 on the 1st layer.

Having analysed the performance of 50 CNNs with randomly chosen hyperparameters, values which tended to result in poor performance were eliminated.

The code in this notebook is virtually identical to that in the notebook used for the 1st two rounds. The main difference is the range of hyperparameter values that were sampled and the training logs at the end.

The analysis for all rounds is carried out in a separate notebook.

Data Preparation¶

# Provided on https://www.cs.toronto.edu/~kriz/cifar.html 
# Given a "pickled" file, returns a dictionary containing the image data
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

The training data is stored in four files, which are read and merged.

import numpy as np

for file_no in range(4):
    # Obtain data dictionary from each file
    filename = "cifar-10-batches-py/data_batch_" + str(file_no + 1)
    image_batch = unpickle(filename)
    # First file, create numpy arrays containing data & labels
    if file_no == 0:
        # Reshape to 32x32 image with 3 channels (RGB), which is made to be the last axes
        image_data = image_batch[b'data'].reshape((-1,3,32,32)).transpose((0,2,3,1))
        image_labels = image_batch[b'labels']
    else: # Concatenate to one file
        new_data = image_batch[b'data'].reshape((-1,3,32,32)).transpose((0,2,3,1))
        image_data = np.concatenate([image_data, new_data])
        image_labels = np.concatenate([image_labels, image_batch[b'labels']])

print("Training data shape: ",image_data.shape)

# Create numpy array containing test data
test_batch = unpickle("cifar-10-batches-py/test_batch")
test_data = test_batch[b'data'].reshape((-1,3,32,32)).transpose((0,2,3,1))
test_labels = test_batch[b'labels']

# Obtain label names from the meta data
label_names = unpickle("cifar-10-batches-py/batches.meta")[b'label_names']
label_names = [l.decode('UTF-8') for l in label_names]

Training data shape:  (40000, 32, 32, 3)

Model Building¶

Hyperparameters¶

import tensorflow as tf
# Fixed hyperparameters
height = 32
width = 32
channels = 3
outputs = 10

n_epochs = 500
# number of epochs to try to get a lower loss before stopping
early_stop_rounds = 4

# number of iterations, within epoch, to do an accuracy check
acc_check = 25

# Variable Hyperparameters
hyperparam_range = {'filters1':[64, 96],
                    'ksize1':[4, 5],
                    'filters2':[96, 128],
                    'ksize2':[4, 5],
                    'filters3':[96, 128],
                    'ksize3':[4, 5],
                    'full_hidd1':[100, 125],
                    'full_hidd2':[100, 125],
                    'activation':['lrelu'],
                    'learning_rate':[0.001, 0.0015, 0.002, 0.003],
                    'batch_size':[64],
                    'momentum':[0.9, 0.95, 0.99],
                    'patch_reduction':[0],
                    'optimizer':['adam']
                   }

# Calculate the number of hyperparamter grid points
first_item = True
for key,values in hyperparam_range.items():
    count = len(values)
    if first_item:
        display = str(count)
        total = count
        first_item = False
    else:
        display = display + ' X ' + str(count)
        total *= count

print('The total number of possible hyperparameter combinations is ' + display + ' = ' + "{:,}".format(total))

The total number of possible hyperparameter combinations is 2 X 1 X 4 X 2 X 2 X 2 X 2 X 1 X 1 X 3 X 2 X 1 X 2 X 2 = 3,072

Convolutional Layer¶

def conv_layer(tensor_input, layer_no, filters, ksize, kstride, activation_unit, momentum, phase_train):
    # convolutional layer with batch normalisation and max pooling
    with tf.name_scope("conv_layer" + str(layer_no)):
        conv = tf.layers.conv2d(
            tensor_input,
            filters=filters,
            kernel_size=ksize,
            strides=[kstride,kstride],
            padding="SAME",
            activation=None
        )

        conv_bn = tf.layers.batch_normalization(
            inputs=conv,
            axis=-1,
            momentum=0.9,
            epsilon=0.001,
            center=True,
            scale=True,
            trainable=True,
            training = phase_train
        )

        #apply activation unit
        conv_bn_relu =  activation_unit(conv_bn)

        max_pool = tf.nn.max_pool(
            conv_bn_relu,
            ksize=[1,2,2,1],
            strides=[1,2,2,1],
            padding="VALID"
        )
        
        dropout = tf.layers.dropout(
            max_pool,
            training = phase_train
        )
            
            
        return dropout

Graph defintion¶

def build_graph(hyperparam):
    # Retrieve hyperparamaters from dictionary   
    filters1 = hyperparam['filters1']
    ksize1 = hyperparam['ksize1']
    filters2 = hyperparam['filters2']
    ksize2 = hyperparam['ksize2']
    filters3 = hyperparam['filters3']
    ksize3 = hyperparam['ksize3']
    full_hidd1 = hyperparam['full_hidd1']
    full_hidd2 = hyperparam['full_hidd2']
    activation = hyperparam['activation']
    learning_rate = hyperparam['learning_rate']
    momentum = hyperparam['momentum']
    patch_reduction =  hyperparam['patch_reduction']
    optimizer_method = hyperparam['optimizer']
    
    patch_height = height - 2 * patch_reduction
    patch_width = width - 2 * patch_reduction
    
    if activation == 'elu':
        activation_unit = tf.nn.elu
    elif activation == 'lrelu':
        activation_unit = tf.nn.leaky_relu
    else:    
        activation_unit = tf.nn.relu
        
    graph = tf.Graph()
    with graph.as_default():
        X = tf.placeholder(shape=(None, patch_height, patch_width, channels), dtype=tf.float32)
        y = tf.placeholder(shape=(None), dtype=tf.int32)
        phase_train = tf.placeholder(tf.bool, name='phase_train')

        # 1st convolutional layer
        conv1 = conv_layer(X, 1, filters1, ksize1, 1, activation_unit, momentum, phase_train)
        
        # 2nd convolutional layer
        conv2 = conv_layer(conv1, 2, filters2, ksize2, 1, activation_unit, momentum, phase_train)
        
        # 3rd cnn
        conv3 = conv_layer(conv2, 3, filters3, ksize3, 1, activation_unit, momentum, phase_train)
        
        # 1st fully connected (dense) layer
        fully_conn1 = tf.layers.dense(conv3, full_hidd1, name="fully_conn1", activation=activation_unit)
        flat = tf.contrib.layers.flatten(fully_conn1)
        
        # 2nd fully connected layer
        fully_conn2 = tf.layers.dense(flat, full_hidd2, name="fully_conn2", activation=activation_unit)
        
        # Output layer (no activation function, as this is built into cross-entropy)
        logits = tf.layers.dense(fully_conn2, outputs, name="logits")

        # Cross Entropy Loss
        with tf.name_scope("loss"):
            xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
            loss = tf.reduce_mean(xentropy, name="loss")

        # Training    
        with tf.name_scope("train"):
            if optimizer_method == 'rmsprop':
                    optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate, momentum=0.9)
            elif optimizer_method == 'nesterov':
                    optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9, use_nesterov = True)
            else:    
                optimizer = tf.train.AdamOptimizer(learning_rate)
            training_op = optimizer.minimize(loss)
    
        # Initialization & Saver
        init = tf.global_variables_initializer()
        saver = tf.train.Saver() 
        
        return graph, X, y, phase_train, logits, loss, training_op, init, saver

Model Training¶

# Function to train model, returns minimised loss
def train_model(batch_size = 128):
    training_size = image_data.shape[0]
    no_batches = training_size // batch_size
    
    with graph.as_default():
        # Ensure batch normalisation gets updated
        extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)    
        # Add Evaluation metrics
        with tf.name_scope("eval"):
            correct = tf.nn.in_top_k(logits, y, 1)
            accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
            acc_summary = tf.summary.scalar('Accuracy', accuracy)
            loss_summary = tf.summary.scalar('Loss', loss)
            
    # Initialise best loss for early stopping
    best_loss = 1e9
    best_acc = 1e9
    early_stopping = False
    
    # Run training
    with tf.Session(graph=graph) as sess:                     
        init.run()
        epoch = 0
        early_stop_count = 0
        # Initialise training accuracy sum
        sum_acc_train = 0.0
        sum_loss_train = 0.0
        count_train = 0
        while (epoch < n_epochs) and (not early_stopping):
            epoch += 1
            # Shuffle training data
            p = np.random.permutation(training_size)
            image_data_shuffle = image_data_patch[p]
            image_labels_shuffle = image_labels[p]
            for iteration in range(no_batches-1):
                # Train on mini-batch
                X_batch = image_data_shuffle[iteration * batch_size:(iteration + 1) * batch_size + 1]
                y_batch = image_labels_shuffle[iteration * batch_size:(iteration + 1) * batch_size + 1]
                sess.run([training_op, extra_update_ops], feed_dict={X: X_batch, y: y_batch, phase_train: True})
                # Evaluate model on current mini-batch
                if (iteration % acc_check == 0):
                    acc_train, loss_train = sess.run([accuracy, loss], feed_dict={X: X_batch,
                                                                                   y: y_batch,
                                                                                   phase_train: True})
                    # Update training average for current epoch
                    sum_acc_train += acc_train
                    sum_loss_train += loss_train
                    count_train += 1
                    mean_acc_train = sum_acc_train / count_train
                    mean_loss_train = sum_loss_train / count_train
                    print(epoch, iteration, "Train accuracy:", mean_acc_train, " Train loss:", mean_loss_train, end='\r')
                     
            # Print mean Train accuracy        
            print(epoch, iteration, "Train accuracy:", mean_acc_train, " Train loss:", mean_loss_train)
            train_summary = tf.Summary()
            train_summary.value.add(tag='eval/Accuracy', simple_value=mean_acc_train)
            train_summary.value.add(tag='eval/Loss', simple_value=mean_loss_train)
            file_writer_train.add_summary(train_summary, epoch)
            
            # Evaluate model on test data every epoch
            acc_test, loss_test, summary_str_acc_test, summary_str_loss_test = sess.run([accuracy, loss,
                                                                                         acc_summary,
                                                                                         loss_summary],
                                                                                        feed_dict={X: test_data_patch,
                                                                                                   y: test_labels,
                                                                                                   phase_train: False})
            # Check for mininimum loss
            if loss_test < best_loss:
                # Reset early stopping best loss and count
                best_loss = loss_test
                best_acc = acc_test
                best_train_acc = mean_acc_train
                best_train_loss = mean_loss_train
                early_stop_count = 0
                # Save model
                save_path = saver.save(sess, ckptfile)
            else:
                # Increment early stopping count
                early_stop_count += 1
            # Check if sufficient rounds (epochs) have passed without improvement
            if early_stop_count  > early_stop_rounds:
                # Flag early stopping, so training loop will stop
                early_stopping = True
                print("Early stopping, best loss: ", best_loss)
            else:    
                # Record and display test loss and accuracy
                file_writer_test.add_summary(summary_str_acc_test, epoch)
                file_writer_test.add_summary(summary_str_loss_test, epoch)
                print(epoch, "Test accuracy:", acc_test, " Test loss:", loss_test)
            
        # Return Best Loss obtaned and number of epochs
        return best_loss, best_acc, best_train_acc, best_train_loss, epoch

Random Hyperparameter Selection and Evaluation¶

import pandas as pd

import random
random.seed(123)

import os.path
if not os.path.exists('./log'):
    os.makedirs('./log')
if not os.path.exists('./graphs'):
    os.makedirs('./graphs')
    
if(os.path.isfile('./log/model_hyperparam_3.csv')):
    hyperparam_df = pd.read_csv('./log/model_hyperparam_3.csv')
    new_results_file = False
else:
    new_results_file = True

# Random Model search
for n in range(5):
    # Create log directory using current timestamp
    from datetime import datetime
    now = datetime.now().strftime("%Y%m%dT%H%M%S")
    root_logdir = "./log"
    root_graphdir = "./graphs"
    logdir = "{}/run-{}/".format(root_logdir, now)
    graphdir = "{}/run-{}/".format(root_graphdir, now)
    #root_modeldir = "./model"
    graphfile = graphdir
    ckptfile = graphdir + "checkpoint.ckpt"

    # Note that the graph is not written using same filewriter as the logging data.
    # This allows the logs to be viewed during training before the filewriter is closed
    file_writer_train = tf.summary.FileWriter(logdir + '/train')
    file_writer_test = tf.summary.FileWriter(logdir + '/test')
    
    # tensorboard --logdir e:\Programming\TensorFlow\CIFAR-10\log
    # The command must be executed from same drive (E:) as the logdir

    tf.reset_default_graph()

    hyperparam_dict = {key:random.choice(values) for (key,values) in hyperparam_range.items()}
    print("Training Model with following hyperparameters:")
    print(hyperparam_dict)

    # Obtain graph and nodes required for training
    graph, X, y, phase_train, logits, loss, training_op, init, saver = build_graph(hyperparam = hyperparam_dict)
    
    # Write graph to file and close to avoid TensorBoard conflict
    file_writer_graph = tf.summary.FileWriter(graphfile, graph)
    file_writer_graph.close()

    # Extract image patches
    patch_reduction =  hyperparam_dict['patch_reduction']
    if patch_reduction == 0:
        image_data_patch = image_data
        test_data_patch = test_data
    else:
        image_data_patch = image_data[:,patch_reduction:-patch_reduction,patch_reduction:-patch_reduction,:]
        test_data_patch = test_data[:,patch_reduction:-patch_reduction,patch_reduction:-patch_reduction,:]
    
    # Train the model
    best_loss, best_acc, best_train_acc, best_train_loss, no_epochs = train_model(batch_size = hyperparam_dict['batch_size'])
    
    # Add results to dictionary
    hyperparam_dict['best_loss'] = best_loss
    hyperparam_dict['best_acc'] = best_acc
    hyperparam_dict['best_train_acc'] = best_train_acc
    hyperparam_dict['best_train_loss'] = best_train_loss
    hyperparam_dict['no_epochs'] = no_epochs
    hyperparam_dict['logdir'] = logdir
    
    # Add results to dataframe
    if new_results_file:
        hyperparam_df = pd.DataFrame(hyperparam_dict, index=[0])
        new_results_file = False
    else:
        hyperparam_df = hyperparam_df.append(hyperparam_dict, ignore_index=True)
        
    # Write results file (don't wait until end, as next model may be interrupted)
    hyperparam_df.to_csv('./log/model_hyperparam_3.csv', index=False)

    # Close filewriters
    file_writer_train.close()
    file_writer_test.close()
    
    # NOTE - From this point onwards there are only training logs.
    # Results and Conclusion are in a seperate notebook

Training Model with following hyperparameters:
{'ksize1': 4, 'patch_reduction': 0, 'learning_rate': 0.001, 'ksize2': 5, 'ksize3': 5, 'filters1': 64, 'full_hidd2': 100, 'batch_size': 64, 'filters3': 96, 'activation': 'lrelu', 'momentum': 0.95, 'optimizer': 'adam', 'filters2': 96, 'full_hidd1': 125}
1 623 Train accuracy: 0.402461548746  Train loss: 1.63585497856
1 Test accuracy: 0.41  Test loss: 1.76227
2 623 Train accuracy: 0.473846169561  Train loss: 1.43774188995
2 Test accuracy: 0.5482  Test loss: 1.27906
3 623 Train accuracy: 0.517128223081  Train loss: 1.32683193684
3 Test accuracy: 0.5799  Test loss: 1.15588
4 623 Train accuracy: 0.548461557254  Train loss: 1.24980944633
4 Test accuracy: 0.654  Test loss: 0.96421
5 623 Train accuracy: 0.573907711208  Train loss: 1.18282833958
5 Test accuracy: 0.624  Test loss: 1.06843
6 623 Train accuracy: 0.590564121554  Train loss: 1.13227905432
6 Test accuracy: 0.7017  Test loss: 0.846809
7 623 Train accuracy: 0.607032985815  Train loss: 1.09171028716
7 Test accuracy: 0.6738  Test loss: 0.934429
8 623 Train accuracy: 0.620461557023  Train loss: 1.05608430356
8 Test accuracy: 0.6837  Test loss: 0.930079
9 623 Train accuracy: 0.631931642261  Train loss: 1.02229827642
9 Test accuracy: 0.6679  Test loss: 0.968338
10 623 Train accuracy: 0.643876941055  Train loss: 0.990058308363
10 Test accuracy: 0.7045  Test loss: 0.862004
11 623 Train accuracy: 0.653426591131  Train loss: 0.964523540952
11 Test accuracy: 0.739  Test loss: 0.74815
12 623 Train accuracy: 0.662358991777  Train loss: 0.940491941472
12 Test accuracy: 0.739  Test loss: 0.744716
13 623 Train accuracy: 0.670343212417  Train loss: 0.917541624399
13 Test accuracy: 0.7058  Test loss: 0.854773
14 623 Train accuracy: 0.676483533446  Train loss: 0.901767110484
14 Test accuracy: 0.7122  Test loss: 0.847137
15 623 Train accuracy: 0.683774375657  Train loss: 0.883369219764
15 Test accuracy: 0.7641  Test loss: 0.677325
16 623 Train accuracy: 0.690307708737  Train loss: 0.865797477293
16 Test accuracy: 0.7766  Test loss: 0.655008
17 623 Train accuracy: 0.695963817116  Train loss: 0.849651826199
17 Test accuracy: 0.729  Test loss: 0.827669
18 623 Train accuracy: 0.70085471688  Train loss: 0.8363247887964
18 Test accuracy: 0.7847  Test loss: 0.637285
19 623 Train accuracy: 0.705554671711  Train loss: 0.822717470119
19 Test accuracy: 0.7786  Test loss: 0.645443
20 623 Train accuracy: 0.71003078489  Train loss: 0.8090769740649
20 Test accuracy: 0.7384  Test loss: 0.815753
21 623 Train accuracy: 0.714930418375  Train loss: 0.794580973415
21 Test accuracy: 0.7799  Test loss: 0.632761
22 623 Train accuracy: 0.718937078213  Train loss: 0.783606952998
22 Test accuracy: 0.7642  Test loss: 0.699819
23 623 Train accuracy: 0.722568576997  Train loss: 0.774172679989
23 Test accuracy: 0.7712  Test loss: 0.682395
24 623 Train accuracy: 0.725846168833  Train loss: 0.765326953207
24 Test accuracy: 0.765  Test loss: 0.70626
25 623 Train accuracy: 0.729132322538  Train loss: 0.756271959901
25 Test accuracy: 0.7771  Test loss: 0.675653
26 623 Train accuracy: 0.732189363826  Train loss: 0.747390702573
Early stopping, best loss:  0.632761
Training Model with following hyperparameters:
{'ksize1': 5, 'patch_reduction': 0, 'learning_rate': 0.0015, 'ksize2': 4, 'ksize3': 5, 'filters1': 64, 'full_hidd2': 125, 'batch_size': 64, 'filters3': 128, 'activation': 'lrelu', 'momentum': 0.95, 'optimizer': 'adam', 'filters2': 96, 'full_hidd1': 100}
1 623 Train accuracy: 0.409230779409  Train loss: 1.64000178337
1 Test accuracy: 0.4796  Test loss: 1.45752
2 623 Train accuracy: 0.483692323565  Train loss: 1.43848528981
2 Test accuracy: 0.4863  Test loss: 1.52475
3 623 Train accuracy: 0.522666683992  Train loss: 1.32892257373
3 Test accuracy: 0.6244  Test loss: 1.03138
4 623 Train accuracy: 0.555230787098  Train loss: 1.24172924817
4 Test accuracy: 0.6327  Test loss: 1.03843
5 623 Train accuracy: 0.577846172094  Train loss: 1.17904905519
5 Test accuracy: 0.6263  Test loss: 1.07032
6 623 Train accuracy: 0.598769248923  Train loss: 1.12760511518
6 Test accuracy: 0.7134  Test loss: 0.824861
7 623 Train accuracy: 0.613450567552  Train loss: 1.08796114138
7 Test accuracy: 0.6483  Test loss: 1.05829
8 623 Train accuracy: 0.627153864056  Train loss: 1.04711901233
8 Test accuracy: 0.6978  Test loss: 0.860839
9 623 Train accuracy: 0.638906000588  Train loss: 1.01403799076
9 Test accuracy: 0.704  Test loss: 0.862383
10 623 Train accuracy: 0.652184632659  Train loss: 0.979283409715
10 Test accuracy: 0.7405  Test loss: 0.737845
11 623 Train accuracy: 0.661706310727  Train loss: 0.953206715475
11 Test accuracy: 0.7486  Test loss: 0.737396
12 623 Train accuracy: 0.67112822185  Train loss: 0.9257048913843
12 Test accuracy: 0.73  Test loss: 0.791902
13 623 Train accuracy: 0.677775164476  Train loss: 0.905661580654
13 Test accuracy: 0.6469  Test loss: 1.08447
14 623 Train accuracy: 0.684263752614  Train loss: 0.887534710254
14 Test accuracy: 0.6735  Test loss: 1.02285
15 623 Train accuracy: 0.690830785354  Train loss: 0.867991925955
15 Test accuracy: 0.7341  Test loss: 0.81585
16 623 Train accuracy: 0.697230785117  Train loss: 0.850534543097
16 Test accuracy: 0.775  Test loss: 0.649881
17 623 Train accuracy: 0.704108612888  Train loss: 0.832614244994
17 Test accuracy: 0.6925  Test loss: 0.9803
18 600 Train accuracy: 0.709128220545  Train loss: 0.817314759493623 Train accuracy: 0.709128220545  Train loss: 0.817314759493
18 Test accuracy: 0.738  Test loss: 0.802703
19 623 Train accuracy: 0.714299610351  Train loss: 0.803695879359
19 Test accuracy: 0.7265  Test loss: 0.814068
20 623 Train accuracy: 0.718953861177  Train loss: 0.790938818932
20 Test accuracy: 0.6997  Test loss: 1.00834
21 623 Train accuracy: 0.723838842653  Train loss: 0.777566736199
Early stopping, best loss:  0.649881
Training Model with following hyperparameters:
{'ksize1': 4, 'patch_reduction': 0, 'learning_rate': 0.001, 'ksize2': 5, 'ksize3': 5, 'filters1': 96, 'full_hidd2': 125, 'batch_size': 64, 'filters3': 128, 'activation': 'lrelu', 'momentum': 0.95, 'optimizer': 'adam', 'filters2': 96, 'full_hidd1': 125}
1 623 Train accuracy: 0.435692318082  Train loss: 1.60747704029
1 Test accuracy: 0.5184  Test loss: 1.3315
2 623 Train accuracy: 0.498769247234  Train loss: 1.41931112528
2 Test accuracy: 0.6032  Test loss: 1.1068
3 623 Train accuracy: 0.531282069882  Train loss: 1.30263843139
3 Test accuracy: 0.5268  Test loss: 1.39049
4 623 Train accuracy: 0.568615403324  Train loss: 1.21237069666
4 Test accuracy: 0.6216  Test loss: 1.11693
5 623 Train accuracy: 0.591507711053  Train loss: 1.14567852592
5 Test accuracy: 0.6738  Test loss: 0.940117
6 623 Train accuracy: 0.612923095326  Train loss: 1.09046458483
6 Test accuracy: 0.6525  Test loss: 1.04505
7 623 Train accuracy: 0.628571446708  Train loss: 1.04986017363
7 Test accuracy: 0.7103  Test loss: 0.818456
8 623 Train accuracy: 0.641923094764  Train loss: 1.00903837771
8 Test accuracy: 0.6893  Test loss: 0.898774
9 623 Train accuracy: 0.653196598755  Train loss: 0.976740395493
9 Test accuracy: 0.7199  Test loss: 0.799646
10 623 Train accuracy: 0.66387694031  Train loss: 0.9468899236926
10 Test accuracy: 0.7359  Test loss: 0.782512
11 623 Train accuracy: 0.673342674266  Train loss: 0.920014627955
11 Test accuracy: 0.7416  Test loss: 0.747863
12 623 Train accuracy: 0.683128221681  Train loss: 0.891300521692
12 Test accuracy: 0.7558  Test loss: 0.708895
13 623 Train accuracy: 0.691171613886  Train loss: 0.866610339055
13 Test accuracy: 0.7665  Test loss: 0.684781
14 623 Train accuracy: 0.698285730268  Train loss: 0.846880062308
14 Test accuracy: 0.7481  Test loss: 0.744763
15 623 Train accuracy: 0.705394887487  Train loss: 0.826825923363
15 Test accuracy: 0.7675  Test loss: 0.69161
16 623 Train accuracy: 0.711807707734  Train loss: 0.809539821446
16 Test accuracy: 0.758  Test loss: 0.724414
17 623 Train accuracy: 0.717212684891  Train loss: 0.793946913481
17 Test accuracy: 0.7831  Test loss: 0.632849
18 623 Train accuracy: 0.722529929512  Train loss: 0.779549190534
18 Test accuracy: 0.7794  Test loss: 0.644159
19 600 Train accuracy: 0.727773294104  Train loss: 0.765125100738623 Train accuracy: 0.727773294104  Train loss: 0.765125100738
19 Test accuracy: 0.7599  Test loss: 0.742068
20 623 Train accuracy: 0.731876937658  Train loss: 0.753798472881
20 Test accuracy: 0.7917  Test loss: 0.624902
21 623 Train accuracy: 0.735882798291  Train loss: 0.743288304068
21 Test accuracy: 0.767  Test loss: 0.714343
22 623 Train accuracy: 0.739972042198  Train loss: 0.731531067978
22 Test accuracy: 0.7834  Test loss: 0.667931
23 623 Train accuracy: 0.744026769892  Train loss: 0.720355961536
23 Test accuracy: 0.7948  Test loss: 0.614702
24 623 Train accuracy: 0.747333347226  Train loss: 0.711331373453
24 Test accuracy: 0.8051  Test loss: 0.58721
25 623 Train accuracy: 0.750572321439  Train loss: 0.701923216057
25 Test accuracy: 0.7919  Test loss: 0.633341
26 623 Train accuracy: 0.754011847904  Train loss: 0.692114287523
26 Test accuracy: 0.7956  Test loss: 0.641457
27 623 Train accuracy: 0.757743603146  Train loss: 0.682787806789
27 Test accuracy: 0.7333  Test loss: 0.877364
28 623 Train accuracy: 0.7611648484  Train loss: 0.67330351789154
28 Test accuracy: 0.8048  Test loss: 0.587198
29 623 Train accuracy: 0.764626007758  Train loss: 0.664436435725
29 Test accuracy: 0.7524  Test loss: 0.802526
30 623 Train accuracy: 0.766892320653  Train loss: 0.656999752741
30 Test accuracy: 0.8085  Test loss: 0.593728
31 623 Train accuracy: 0.769310186544  Train loss: 0.649615428467
31 Test accuracy: 0.7595  Test loss: 0.800232
32 623 Train accuracy: 0.771961551178  Train loss: 0.642172379643
32 Test accuracy: 0.7931  Test loss: 0.650019
33 623 Train accuracy: 0.774452227047  Train loss: 0.634839963841
Early stopping, best loss:  0.587198
Training Model with following hyperparameters:
{'ksize1': 4, 'patch_reduction': 0, 'learning_rate': 0.003, 'ksize2': 5, 'ksize3': 5, 'filters1': 96, 'full_hidd2': 100, 'batch_size': 64, 'filters3': 128, 'activation': 'lrelu', 'momentum': 0.95, 'optimizer': 'adam', 'filters2': 128, 'full_hidd1': 125}
1 600 Train accuracy: 0.413538469672  Train loss: 1.72801503181623 Train accuracy: 0.413538469672  Train loss: 1.72801503181
1 Test accuracy: 0.4131  Test loss: 1.75982
2 623 Train accuracy: 0.492000014484  Train loss: 1.47088844538
2 Test accuracy: 0.4715  Test loss: 1.49798
3 623 Train accuracy: 0.542153862913  Train loss: 1.30963759979
3 Test accuracy: 0.6526  Test loss: 0.97569
4 623 Train accuracy: 0.574000017494  Train loss: 1.20952440083
4 Test accuracy: 0.6787  Test loss: 0.927079
5 623 Train accuracy: 0.597046171308  Train loss: 1.13770901108
5 Test accuracy: 0.6373  Test loss: 1.08898
6 623 Train accuracy: 0.62133335044  Train loss: 1.073945595822
6 Test accuracy: 0.7035  Test loss: 0.8433
7 623 Train accuracy: 0.635076940145  Train loss: 1.03406102794
7 Test accuracy: 0.7355  Test loss: 0.757713
8 623 Train accuracy: 0.646076940075  Train loss: 1.00185199589
8 Test accuracy: 0.731  Test loss: 0.786283
9 600 Train accuracy: 0.657504290276  Train loss: 0.973132497474623 Train accuracy: 0.657504290276  Train loss: 0.97313249747
9 Test accuracy: 0.7483  Test loss: 0.731329
10 623 Train accuracy: 0.666584631979  Train loss: 0.948467275513
10 Test accuracy: 0.7166  Test loss: 0.823372
11 623 Train accuracy: 0.675356659727  Train loss: 0.923461653861
11 Test accuracy: 0.7403  Test loss: 0.754908
12 623 Train accuracy: 0.681948734174  Train loss: 0.907571005126
12 Test accuracy: 0.7154  Test loss: 0.846587
13 623 Train accuracy: 0.689136110682  Train loss: 0.887134800232
13 Test accuracy: 0.7123  Test loss: 0.863928
14 623 Train accuracy: 0.696967048688  Train loss: 0.867366472398
14 Test accuracy: 0.7526  Test loss: 0.724414
15 623 Train accuracy: 0.70334360524  Train loss: 0.8490573950619
15 Test accuracy: 0.7541  Test loss: 0.738622
16 623 Train accuracy: 0.708076938428  Train loss: 0.834882441238
16 Test accuracy: 0.7708  Test loss: 0.679167
17 600 Train accuracy: 0.712687798002  Train loss: 0.819579425209623 Train accuracy: 0.712687798002  Train loss: 0.819579425209
17 Test accuracy: 0.6577  Test loss: 1.14585
18 623 Train accuracy: 0.717914544907  Train loss: 0.806051905817
18 Test accuracy: 0.7852  Test loss: 0.654355
19 623 Train accuracy: 0.72200811201  Train loss: 0.7939361188297
19 Test accuracy: 0.7395  Test loss: 0.804105
20 623 Train accuracy: 0.726400014669  Train loss: 0.781413454711
20 Test accuracy: 0.757  Test loss: 0.72945
21 623 Train accuracy: 0.730168512691  Train loss: 0.770988448688
21 Test accuracy: 0.7268  Test loss: 0.930537
22 623 Train accuracy: 0.733930084299  Train loss: 0.761124033928
22 Test accuracy: 0.7811  Test loss: 0.649493
23 623 Train accuracy: 0.736909713253  Train loss: 0.751419904439
23 Test accuracy: 0.8062  Test loss: 0.579502
24 623 Train accuracy: 0.740076937204  Train loss: 0.742507108748
24 Test accuracy: 0.7729  Test loss: 0.677108
25 623 Train accuracy: 0.743040014005  Train loss: 0.734112475061
25 Test accuracy: 0.7551  Test loss: 0.7646
26 623 Train accuracy: 0.745633149995  Train loss: 0.726298235936
26 Test accuracy: 0.7756  Test loss: 0.671759
27 623 Train accuracy: 0.748125369924  Train loss: 0.719191595978
27 Test accuracy: 0.7869  Test loss: 0.66522
28 600 Train accuracy: 0.750549464247  Train loss: 0.711729701502623 Train accuracy: 0.750549464247  Train loss: 0.711729701502
Early stopping, best loss:  0.579502
Training Model with following hyperparameters:
{'ksize1': 5, 'patch_reduction': 0, 'learning_rate': 0.003, 'ksize2': 5, 'ksize3': 5, 'filters1': 64, 'full_hidd2': 100, 'batch_size': 64, 'filters3': 128, 'activation': 'lrelu', 'momentum': 0.9, 'optimizer': 'adam', 'filters2': 128, 'full_hidd1': 100}
1 623 Train accuracy: 0.430153856874  Train loss: 1.80154718876
1 Test accuracy: 0.4647  Test loss: 1.49974
2 623 Train accuracy: 0.495384630859  Train loss: 1.51928786516
2 Test accuracy: 0.5694  Test loss: 1.22344
3 623 Train accuracy: 0.537846171657  Train loss: 1.37213389953
3 Test accuracy: 0.5738  Test loss: 1.22898
4 600 Train accuracy: 0.569692326039  Train loss: 1.26225036919623 Train accuracy: 0.569692326039  Train loss: 1.26225036919
4 Test accuracy: 0.6862  Test loss: 0.901165
5 623 Train accuracy: 0.589169249415  Train loss: 1.19629534292
5 Test accuracy: 0.6281  Test loss: 1.0847
6 623 Train accuracy: 0.608923095365  Train loss: 1.13392838836
6 Test accuracy: 0.6553  Test loss: 1.01021
7 623 Train accuracy: 0.625758259892  Train loss: 1.08495020567
7 Test accuracy: 0.7041  Test loss: 0.84385
8 623 Train accuracy: 0.639461556301  Train loss: 1.04676324695
8 Test accuracy: 0.7138  Test loss: 0.822197
9 623 Train accuracy: 0.650324803922  Train loss: 1.01117461922
9 Test accuracy: 0.7391  Test loss: 0.757786
10 623 Train accuracy: 0.660800017297  Train loss: 0.979996391892
10 Test accuracy: 0.6803  Test loss: 0.953411
11 623 Train accuracy: 0.669034982107  Train loss: 0.953694098863
11 Test accuracy: 0.7445  Test loss: 0.748155
12 623 Train accuracy: 0.677794888566  Train loss: 0.927559084694
12 Test accuracy: 0.7149  Test loss: 0.85505
13 623 Train accuracy: 0.686248537164  Train loss: 0.903107179128
13 Test accuracy: 0.7064  Test loss: 0.876897
14 623 Train accuracy: 0.692307708561  Train loss: 0.886014713645
14 Test accuracy: 0.7475  Test loss: 0.769914
15 623 Train accuracy: 0.698051298102  Train loss: 0.867963595549
15 Test accuracy: 0.7462  Test loss: 0.751515
16 623 Train accuracy: 0.702615400515  Train loss: 0.852848301679
16 Test accuracy: 0.7619  Test loss: 0.702908
17 623 Train accuracy: 0.708380106162  Train loss: 0.835695354868
17 Test accuracy: 0.7575  Test loss: 0.703064
18 623 Train accuracy: 0.712854716347  Train loss: 0.821939857403
18 Test accuracy: 0.7814  Test loss: 0.661598
19 600 Train accuracy: 0.717473699513  Train loss: 0.809084403954623 Train accuracy: 0.717473699513  Train loss: 0.809084403954
19 Test accuracy: 0.7107  Test loss: 0.922034
20 623 Train accuracy: 0.721076938242  Train loss: 0.797817967355
20 Test accuracy: 0.7894  Test loss: 0.622922
21 623 Train accuracy: 0.724571443597  Train loss: 0.787304378407
21 Test accuracy: 0.7422  Test loss: 0.782186
22 623 Train accuracy: 0.728139875016  Train loss: 0.776668269147
22 Test accuracy: 0.7115  Test loss: 0.918598
23 623 Train accuracy: 0.731398008051  Train loss: 0.766863101254
23 Test accuracy: 0.7733  Test loss: 0.683023
24 623 Train accuracy: 0.734564117168  Train loss: 0.757698368728
24 Test accuracy: 0.7395  Test loss: 0.800556
25 623 Train accuracy: 0.737181552958  Train loss: 0.749989327192
Early stopping, best loss:  0.622922