This blog post has recent publications about use of Deep Learning in Energy Production context (wind, gas and oil), e.g. wind power prediction, turbine risk assessment, reservoir discovery and price forecasting.
Making Self-driving cars work requires several technologies and methods to pull in the same direction (e.g. Radar/Lidar, Camera, Control Theory and Deep Learning). The online available Self-Driving Car Nanodegree from Udacity (divided into 3 terms) is probably the best way to learn more about the topic (see [Term 1], [Term 2] and [Term 3] for more details about each term), the coolest part is that you actually can run your code on an actual self-driving car towards the end of term 3 (I am currently in the middle of term 1 – highly recommended course!).
Note: before taking this course I recommend taking Udacity’s Deep Learning Nanodegree Foundations since most (term 1) projects requires some hands-on experience with Deep Learning.
Traffic Sign Detection with Convolutional Neural Networks
This blog post is a writeup of my (non-perfect) approach for German traffic sign detection (a project in the course) with Convolutional Neural networks (in TensorFlow) – a variant of LeNet with Dropout and (the new) SELU – Self-Normalizing Neural Networks. The effect of SELU was primarily that it quickly gained classification accuracy (even in first epoch), but didn’t lead to higher accuracy than using batch-normalisation + RELU in the end. (Details at: github.com/atveit/TrafficSignClassification). Data Augmentation in particular and perhaps a deeper network could have improved the performance I believe.
I used numpy shape to calculate summary statistics of the traffic signs data set:
The size of training set is ? 34799
The size of the validation set is ? 4410
The size of test set is ? 12630
The shape of a traffic sign image is ? 32x32x3 (3 color channels, RGB)
The number of unique classes/labels in the data set is ? 43
2. Visualization of the train, validation and test dataset.
Here is an exploratory visualization of the data set. It is a bar chart showing how the normalized distribution of data for the 43 traffic signs. The key takeaway is that the relative number of data points varies quite a bit between each class, e.g. from around 6.5% (e.g. class 1) to 0.05% (e.g. class 37), i.e. a factor of at least 12 difference (6.5% / 0.05%), this can potentially impact classification performance.
3 Design of Architecture
3.1 Preprocessing of images
Did no grayscale conversion or other conversion of train/test/validation images (they were preprocessed). For the images from the Internet they were read from using PIL and converted to RGB (from RBGA), resized to 32×32 and converted to numpy array before normalization.
All images were normalized pixels in each color channel (RGB – 3 channels with values between 0 to 255) to be between -0.5 to 0.5 by dividing by (128-value)/255. Did no data augmentation.
Here are sample images from the training set
3.2 Model Architecture
Given the relatively low resolution of Images I started with Lenet example provided in lectures, but to improve training I added Dropout (in early layers) with RELU rectifier functions. Recently read about self-normalizing rectifier function – SELU – so decided to try that instead of RELU. It gave no better end result after many epochs, but trained much faster (got > 90% in one epoch), so kept SELU in the original. For more information about SELU check out the paper Self-Normalizing Neural Networks from Johannes Kepler University in Linz, Austria.
My final model consisted of the following layers:
32x32x3 RGB image
1×1 stride, valid padding, outputs 28x28x6
keep_prob = 0.9
2×2 stride, outputs 14x14x6
1×1 stride, valid padding, outputs 10x10x16
keep_prob = 0.9
2×2 stride, outputs 5x5x16
output dimension 400
output dimension 120
output dimension 84
output dimension 84
output dimension 43
3.3 Training of Model
To train the model, I used an Adam optimizer with learning rate of 0.002, 20 epochs (converged fast with SELU) and batch size of 256 (ran on GTX 1070 with 8GB GPU RAM)
3.4 Approach to find solution and getting accuracy > 0.93
Adding dropout to Lenet improved test accuracy and SELU improved training speed. The originally partitioned data sets were quite unbalanced (when plotting), so reading all data, shuffling and creating training/validation/test set also helped. I thought about using Keras and fine tuning a pretrained model (e.g. inception 3), but it could be that a big model on such small images could lead to overfitting (not entirely sure about that though), and reducing input size might lead to long training time (looks like fine tuning is best when you have the same input size, but changing the output classes)
My final model results were:
validation set accuracy of 0.976 (between 0.975-0.982)
test set accuracy of 0.975
If an iterative approach was chosen:
What was the first architecture that was tried and why was it chosen?
Started with Lenet and incrementally added dropout and then several SELU layers.. Also added one fully connected layer more.
What were some problems with the initial architecture?
No, but not great results before adding dropout (to avoid overfitting)
Which parameters were tuned? How were they adjusted and why?
Tried several combinations learning rates. Could reduce epochs after adding SELU. Used same dropout keep rate.
Since the difference between validation accuracy and test accuracy is very low the model seems to be working well. The loss is also quite low (0.02), so little to gain most likely – at least without changing the model a lot.
4 Test a Model on New Images
4.1. Choose five German traffic signs found on the web
Here are five German traffic signs that I found on the web:
In the first pick of images I didn’t check that the signs actually were among the the 43 classes the model was built for, and that was actually not the case, i.e. making it impossible to classify correctly. But got interesting results (regarding finding similar signs) for the wrongly classified ones, so replaced only 2 of them with sign images that actually was covered in the model, i.e. making it still impossible to classify 3 of them.
Here are the results of the prediction:
Speed limit (50km/h)
Adult and child on road
Turn left ahead
Two way traffic ahead
Beware of ice/snow
Speed limit (60km/h)
Speed limit (60km/h)
The model was able to correctly guess 2 of the 5 traffic signs, which gives an accuracy of 40%. For the other ones it can`t classify correctly, but the 2nd prediction for sign 3 – “adult and child on road” – is interesting since it suggests “Go straight or right” – which is quite visually similar (if you blur the innermost of each sign you will get almost the same image).
Magnetic Resonance Imaging (MRI) can be used in many types of diagnosis e.g. cancer, alzheimer, cardiac and muscle/skeleton issues. This blog post has recent publications of Deep Learning applied to MRI (health-related) data, e.g. for segmentation, detection, demonising and classification.
Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to form pictures of the anatomy and the physiological processes of the body in both health and disease. MRI scanners use strong magnetic fields, radio waves, and field gradients to generate images of the organs in the body.
Scaling down images is a craft, scaling up images is an art
Since in the scaling down to a lower resolution you typically need to remove pixels, but in the case of scaling up you need to invent new pixels. But some Deep Learning models with Convolutional Neural Networks (and frequently Deconvolutional layers) has shown successful to scale up images, this is called Image Super-Resolution. These models are typically trained by taking high resolution images and reducing them to lower resolution and then train in the opposite way. Partially related: Recommend also checking out Odeon et. al’s Distill.pub’s publication: Deconvolution and Checkerboard Artifacts that goes into more detail about the one the core operators used in Image Super-Resolution.
This blog post has an overview papers related to acoustic modelling primarily for speech recognition but also speech generation (synthesis). See also ai.amundtveit.com/keyword/acoustic for a broader set of (at the time of writing 73) recent Deep Learning papers related to acoustics for speech recognition and other applications of acoustics.
Acoustic Modelling is described in Wikipedia as: “An acoustic model is used in Automatic Speech Recognition to represent the relationship between an audio signal and the phonemes or other linguistic units that make up speech. The model is learned from a set of audio recordings and their corresponding transcripts”.
For the last couple of months I’ve been creating bibliographies of recent academic publications in various subfields of Deep Learning on this blog. This posting gives an overview of the last 25 bibliographies posted.
This posting presents recent publications related to Deep Learning for Question Answering. Question Answering is described as “a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language”. I’ll also publish postings about Deep Learning for Information Retrieval and Learning to Rank today.
Ensemble Based Machine Learning has been used with success in several Kaggle competitions, and this year also the Imagenet competition was dominated by ensembles in Deep Learning, e.g. Trimps-Soushen team from 3rd Research Institute of the Ministry of Public Security (China) used a combination of Inception, Inception-Resnet, Resnet and Wide Residual Network to win the Object Classification/localization challenge. This blog post has recent papers related to Ensembles in Deep Learning.
EEG (Electroencephalography) is the measurement of electrical signals in the brain. It has long been used for medical purposes (e.g. diagnosis of epilepsy), and has in more recent years also been used in Brain Computer Interfaces (BCI) — note: if BCI is new to you don’t get overly excited about it, since these interfaces are still in my opinion quite premature. But they are definitely interesting in a longer term perspective .
This blog post gives an overview of recent research on Deep Learning in combination with EEG, e.g. r for classification, feature representation, diagnosis, safety (cognitive state of drivers) and hybrid methods (Computer Vision or Speech Recognition together with EEG and Deep Learning).
This blog post has recent papers related to embedding for Natural Language Processing with Deep Learning. Example application areas embedding is used for in the papers include finance (stock market prediction), biomedical text analysis, part-of-speech tagging, sentiment analysis, pharmacology (drug adverse effects).
Alzheimer’s Disease is the cause of 60–70% of cases of Dementia, costs associated to diagnosis, treatment and care of patients with it is estimated to be in the range of a hundred billion dollars in USA. This blog post have some recent papers related to using Deep Learning for diagnostics and decision support related to Alzheimer’s disease.
Ultrasound (also called Sonography) are sound waves with higher frequency than humans can hear, they frequently used in medical settings, e.g. for checking that pregnancy is going well with fetal ultrasound. For more about Ultrasound data formats check out Ultrasound Research Interface. This blog post has recent publications about applying Deep Learning for analyzing Ultrasound data.
Deep Learning (creative AI) might potentially be used for music analysis and music creation. Deepmind’s Wavenet is a step in that direction. This blog post presents recent papers in Deep Learning for Music.
This blog post gives an overview of papers related to using Regularization in Deep Learning submitted to ICLR 2017, see underneath for the list of papers. If you want to learn about Regularization in Deep Learning check out: www.deeplearningbook.org/contents/regularization.html
This blog post gives an overview of papers related to Unsupervised Deep Learning submitted to ICLR 2017, see underneath for the list of papers. If you want to learn about Unsupervised Deep Learning check out: Ruslan Salkhutdinov’s video Foundations of Unsupervised Deep Learning.
This blog post gives an overview of Natural Language Processing related papers submitted to ICLR 2017, see underneath for the list of papers. If you want to learn about Deep Learning with NLP check out Stanford’s CS224d: Deep Learning for Natural Language Processing