Paper References from "Deep Learning School 2016"

[ deep-learning easi ]

I’ve been watching through the video lectures of the “Deep Learning School 2016” playlist on Lex Fridman’s YouTube account. While doing so, I found it useful to collect and collate all the references in each lecture (or as many as I could distinguish and find).

2. Deep Learning for Computer Vision (Andrej Karpathy, OpenAI)

1998: LeCun et al: Gradient-Based Learning Applied to Document Recognition: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
2012: Krizhevsky et al: ImageNet Classification with Deep Convolutional Neural Networks (AlexNet): http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
2013: Donahue et al: DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition: http://proceedings.mlr.press/v32/donahue14.pdf
2013: Zeiler & Fergus: Stochastic Pooling for Regularization of Deep Convolutional Neural Networks: https://arxiv.org/pdf/1301.3557.pdf
2014: Razavian et al: CNN Features off-the-shelf: an Astounding Baseline for Recognition: https://www.cv-foundation.org/openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf
2014: Cadieu et al: Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003963
2014: Simonyan & Zisserman: Very Deep Convolutional Networks for Large-Scale Image Recognition (VGGNet): https://arxiv.org/pdf/1409.1556.pdf%20http://arxiv.org/abs/1409.1556.pdf
2015: Szegedy et al: Going Deeper with Convolutions (Inception): https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf
2015: Yosinski et al: Understanding Neural Networks Through Deep Visualization: https://arxiv.org/pdf/1506.06579.pdf
2015: He et al: Deep Residual Learning for Image Recognition (ResNet): https://arxiv.org/abs/1512.03385
2016: He et al: Identity Mappings in Deep Residual Networks: https://arxiv.org/pdf/1603.05027.pdf
2016: Huang et al: Deep Networks with Stochastic Depth: https://arxiv.org/pdf/1603.09382.pdf
2016: Oord et al: WaveNet: a Generative Model for Raw Audio: https://arxiv.org/pdf/1609.03499.pdf
2016: Targ et al: ResNet in ResNet: Generalizing Residual Architectures: https://arxiv.org/pdf/1603.08029.pdf
2016: Wang et al: Deeply-Fused Nets: https://arxiv.org/pdf/1605.07716.pdf
2016: Shen et al: Weighted Residuals for Very Deep Networks: https://arxiv.org/pdf/1605.08831.pdf
2016: Zhang et al: Residual Networks of Residual Networks: Multilevel Residual Networks: https://arxiv.org/pdf/1608.02908.pdf
2016: Redmon et al: You Only Look Once: Unified, Real-Time Object Detection: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf
2016: Singh et al: Swapout: Learning an ensemble of deep architectures: https://papers.nips.cc/paper/6205-swapout-learning-an-ensemble-of-deep-architectures.pdf
2016: Johnson et al: DenseCap: Fully Convolutional Localization Networks for Dense Captioning: http://openaccess.thecvf.com/content_cvpr_2016/papers/Johnson_DenseCap_Fully_Convolutional_CVPR_2016_paper.pdf
2017: Zagoruyko & Komodakis: Wide Residual Networks: https://arxiv.org/pdf/1605.07146.pdf
2017: Huang et al: Densely Connected Convolutional Networks: http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf
2017: Larsson et al: FractalNet: Ultra-Deep Neural Networks without Residuals: https://arxiv.org/pdf/1605.07648.pdf

3. Deep Learning for Natural Language Processing (Richard Socher, Salesforce)

1997: Hochreiter & Schmidhuber: Long Short-Term Memory: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.4320&rep=rep1&type=pdf
2010: Mikolov et al: Recurrent Neural Network Based Language Model: https://www.isca-speech.org/archive/archive_papers/interspeech_2010/i10_1045.pdf
2013: Mikolov et al (word2vec ref): Efficient Estimation of Word Representations in Vector Space: https://arxiv.org/pdf/1301.3781.pdf
2013: Mikolov et al (word2vec ref): Distributed Representations of Words and Phrases and their Compositionality: https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
2013: Mikolov et al (word2vec ref): Linguistic Regularities in Continuous Space Word Representations: https://www.microsoft.com/en-us/research/publication/linguistic-regularities-in-continuous-space-word-representations/?from=http%3A%2F%2Fresearch.microsoft.com%2Fpubs%2F189726%2Frvecs.pdf
2013: Mikolov et al: Exploiting Similarities among Languages for Machine Translation: https://arxiv.org/pdf/1309.4168.pdf
2013: Socher et al: Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank: https://www.aclweb.org/anthology/D13-1170/
2014: Cho et al: On the Properties of Neural Machine Translation: Encoder-Decoder Approaches: https://arxiv.org/abs/1409.1259
2014: Chung et al: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling: https://arxiv.org/abs/1412.3555
2014: Graves et al: Neural Turing Machines: https://arxiv.org/abs/1410.5401
2014: Irsoy & Cardie: Opinion Mining with Deep Recurrent Neural Networks: https://www.aclweb.org/anthology/D14-1080/
2014: Irsoy & Cardie: Deep Recursive Neural Networks for Compositionality in Language: http://papers.nips.cc/paper/5551-deep-recursive-neural-networks-for-compositionality-in-language
2014: Kalchbrenner et al: A Convolutional Neural Network for Modelling Sentences: https://arxiv.org/abs/1404.2188
2014: Kim: Convolutional Neural Networks for Sentence Classification: https://arxiv.org/abs/1408.5882
2014: Le & Mikolov: Distributed Representations of Sentences and Documents: http://proceedings.mlr.press/v32/le14.pdf
2014: Pennington et al (GloVe ref): Glove: Global Vectors for Word Representation: https://www.aclweb.org/anthology/D14-1162/
2014: Sutskever et al: Sequence to Sequence Learning with Neural Networks: http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks
2014: Weston et al: Memory Networks: https://arxiv.org/abs/1410.3916
2014: Zaremba et al: Recurrent Neural Network Regularization: https://arxiv.org/abs/1409.2329
2014: Zaremba & Sutskever: Learning to Execute: https://arxiv.org/abs/1410.4615
2015: Antol et al: VQA: Visual Question Answering: http://openaccess.thecvf.com/content_iccv_2015/html/Antol_VQA_Visual_Question_ICCV_2015_paper.html
2015: Gal & Ghahramani: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks: http://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurren
2015: Grefenstette et al: Learning to Transduce with Unbounded Memory: http://papers.nips.cc/paper/5648-learning-to-transduce-with-unbounded-memory
2015: Hermann et al: Teaching Machines to Read and Comprehend: http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend
2015: Huang et al: Bidirectional LSTM-CRF Models for Sequence Tagging: https://arxiv.org/abs/1508.01991
2015: Sukhbaatar et al: End-To-End Memory Networks: http://papers.nips.cc/paper/5846-end-to-end-memorynetworks
2015: Tai et al: Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks: https://arxiv.org/abs/1503.00075
2015: Weston et al: Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks: https://arxiv.org/abs/1502.05698
2015: Zhang et al: Structured Memory for Neural Turing Machines: https://arxiv.org/abs/1510.03931
2015: Zhou et al: Simple Baseline for Visual Question Answering: https://arxiv.org/abs/1512.02167
2016: Andreas et al: Neural Module Networks: http://openaccess.thecvf.com/content_cvpr_2016/html/Andreas_Neural_Module_Networks_CVPR_2016_paper.html
2016: Andreas et al: Learning to Compose Neural Networks for Question Answering: https://arxiv.org/abs/1601.01705
2016: Kumar et al: Ask Me Anything: Dynamic Memory Networks for Natural Language Processing: http://proceedings.mlr.press/v48/kumar16.pdf
2016: Merity et al: Pointer Sentinel Mixture Models: https://arxiv.org/abs/1609.07843
2016: Noh et al: Image Question Answering Using Convolutional Neural Network With Dynamic Parameter Prediction: http://openaccess.thecvf.com/content_cvpr_2016/html/Noh_Image_Question_Answering_CVPR_2016_paper.html
2016: Yang et al: Stacked Attention Networks for Image Question Answering: http://openaccess.thecvf.com/content_cvpr_2016/html/Yang_Stacked_Attention_Networks_CVPR_2016_paper.html
2017: Zilly et al: Recurrent Highway Networks: https://arxiv.org/pdf/1607.03474.pdf

4. TensorFlow Tutorial (Sherry Moore, Google Brain)

NOTE: this video is old and TF has changed A LOT. Some of the code in Sherry’s TF tutorial (link below) will likely cause a lot of warnings to be issued, while some of it might not work. A good exercise might be to get it to work TF 2.0.

Sherry’s TF Tutorial: https://github.com/sherrym/tf-tutorial/
Train your own image classifier with Inception in TensorFlow: https://ai.googleblog.com/2016/03/train-your-own-image-classifier-with.html
Models on TensorFlow (NOTE - many links in video no longer work; UPDATED links below):
- Models Page: https://github.com/tensorflow/models
- Inception: https://github.com/tensorflow/models/tree/master/inception
- A Neural Image Caption Generator: https://github.com/tensorflow/models/tree/master/research/im2txt
- Language Model (1B words): https://github.com/tensorflow/models/tree/master/research/lm_1b
- SyntaxNet: https://github.com/tensorflow/models/tree/master/research/syntaxnet
- ResNet: https://github.com/tensorflow/models/tree/master/official/resnet
- Seq2Seq w/ Attention for Text Summarization: https://github.com/tensorflow/models/tree/master/research/textsum
- Image Compression: https://github.com/tensorflow/models/tree/master/research/compression
- Autoencoder: https://github.com/tensorflow/models/tree/master/research/autoencoder
- Spatial Transformer Network: https://github.com/tensorflow/models/tree/master/official/transformer

5. Foundations of Unsupervised Deep Learning (Ruslan Salakhutdinov, CMU)

1995: Hinton et al: The wake-sleep algorithm for unsupervised neural networks: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.215&rep=rep1&type=pdf
1996: Olshausen & Field: Natural image statistics and efficient coding: https://pdfs.semanticscholar.org/4435/2b35791ceaad3439b8ccf165cc9b4978d801.pdf
1996: Olshausen & Field: Sparse coding of natural images produces localized, oriented, bandpass receptive fields: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.6079&rep=rep1&type=pdf
2002: Hinton: Training Products of Experts by Minimizing Contrastive Divergence: http://www.cs.utoronto.ca/~hinton/absps/nccd.pdf
2006: Hinton et al: A Fast Learning Algorithm for Deep Belief Nets: https://www.mitpressjournals.org/doi/pdfplus/10.1162/neco.2006.18.7.1527
2006: Hinton & Salakhutdinov: Reducing the dimensionality of data with neural networks: https://www.semanticscholar.org/paper/Reducing-the-dimensionality-of-data-with-neural-Hinton-Salakhutdinov/46eb79e5eec8a4e2b2f5652b66441e8a4c921c3e
2006: Lee et al: Efficient sparse coding algorithms: http://papers.nips.cc/paper/2979-efficient-sparse-coding-algorithms.pdf
2007: Bengio et al: Greedy Layer-Wise Training of Deep Networks: http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks.pdf
2007: Salakhutdinov et al: Restricted Boltzmann Machines for Collaborative Filtering: http://www.utstat.toronto.edu/~rsalakhu/papers/rbmcf.pdf
2008: Salakhutdinov: Learning and Evaluating Boltzmann Machines: http://www.cs.toronto.edu/~rsalakhu/papers/bm.pdf
2008: Torralba et al: Small Codes and Large Image Databases for Recognition: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.229.3256&rep=rep1&type=pdf
2009: Bengio: Learning Deep Architectures for AI: http://axon.cs.byu.edu/~martinez/classes/678/Papers/ftml.pdf
2009: Kavukcuoglu et al: Learning Invariant Features through Topographic Filter Maps: http://yann.lecun.org/exdb/publis/pdf/koray-cvpr-09.pdf
2009: Kulis & Darrell: Learning to Hash with Binary Reconstructive Embeddings: http://papers.nips.cc/paper/3667-learning-to-hash-with-binary-reconstructive-embeddings.pdf
2009: Larochelle et al: Exploring Strategies for Training Deep Neural Networks: http://www.jmlr.org/papers/volume10/larochelle09a/larochelle09a.pdf
2009: Weiss et al: Spectral Hashing: https://papers.nips.cc/paper/3383-spectral-hashing.pdf
2010: Salakhutdinov & Hinton: Deep Boltzmann Machines: http://proceedings.mlr.press/v5/salakhutdinov09a/salakhutdinov09a.pdf
2010: Salakhutdinov & Larochelle: Efficient Learning of Deep Boltzmann Machines: http://proceedings.mlr.press/v9/salakhutdinov10a/salakhutdinov10a.pdf
2011: Larochelle et al: The Neural Autoregressive Distribution Estimator (NADE): http://proceedings.mlr.press/v15/larochelle11a/larochelle11a.pdf
2012: Hinton & Salakhutdinov: A Better Way to Pretrain Deep Boltzmann Machines: http://papers.nips.cc/paper/4610-a-better-way-to-pretrain-deep-boltzmann-machines
2012: Srivastava & Salakhutdinov: Multimodal Learning with Deep Boltzmann Machines: https://papers.nips.cc/paper/4683-multimodal-learning-with-deep-boltzmann-machines.pdf
2012: Srivastava & Salakhutdinov: Learning Representations for Multimodal Data with Deep Belief Nets: https://pdfs.semanticscholar.org/5555/b28607cada5474bca772e1cc553b624415c9.pdf
2013: Tang & Salakhutdinov: Learning Stochastic Feedforward Neural Networks: http://papers.nips.cc/paper/5026-learning-stochastic-feedforward-neural-networks
2013: Uria et al: RNADE: The real-valued neural autoregressive density-estimator: http://papers.nips.cc/paper/5060-rnade-the-real-valued-neural-autoregressive-density-estimator
2014: Bornschein & Bengio: Reweighted Wake-Sleep: https://arxiv.org/abs/1406.2751
2014: Goodfellow et al: Generative Adversarial Nets: http://papers.nips.cc/paper/5423-generative-adversarial-nets
2014: Kingma & Welling: Stochastic Gradient VB and the Variational Auto-Encoder (“reparameterization trick”): https://pdfs.semanticscholar.org/eaa6/bf5334bc647153518d0205dca2f73aea971e.pdf
2014: Kiros et al: Multimodal Neural Language Models: http://proceedings.mlr.press/v32/kiros14.pdf
2014: Kiros et al: Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models: https://arxiv.org/abs/1411.2539
2014: Mnih & Gregor: Neural Variational Inference and Learning in Belief Networks: https://arxiv.org/abs/1402.0030
2014: Rezende et al: Stochastic Backpropagation and Approximate Inference in Deep Generative Models: https://arxiv.org/abs/1401.4082
2014: Uria et al: A Deep and Tractable Density Estimator: http://proceedings.mlr.press/v32/uria14.pdf
2015: Burda et al: Importance Weighted Autoencoders: https://arxiv.org/abs/1509.00519
2015: Denton et al: Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks (LaPGAN): http://papers.nips.cc/paper/5773-deep-generative-image-models-using-a-5
2015: Gregor et al: DRAW: A Recurrent Neural Network For Image Generation: https://arxiv.org/abs/1502.04623
2015: Lake et al: Human-Level Concept Learning through Probabilistic Program Induction: https://www.sas.upenn.edu/~astocker/lab/teaching-files/PSYC739-2016/Lake_etal2015.pdf
2015: Mansimov et al: Generating Images from Captions with Attention: https://arxiv.org/abs/1511.02793
2015: Radford et al: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks: https://arxiv.org/abs/1511.06434
2016: Salimans et al: Improved Techniques for Training GANs: http://papers.nips.cc/paper/6124-improved-techniques-for-training-gans
2016: van den Oord: Conditional Image Generation with PixelCNN Decoders: http://papers.nips.cc/paper/6527-conditional-image-generation-with-pixelcnn-decoders
2016: van den Oord: Pixel Recurrent Neural Networks: https://arxiv.org/abs/1601.06759

6. Nuts and Bolts of Applying Deep Learning (Andrew Ng)

Video Lecture

Ng talks about a typical, decent ML workflow, but notes one can do a better job at error analysis (i.e., understanding how model bias and model variance are affecting your results).

          *---------------------------------------------------------------*
          v                                                               |
[Training Error High?] ---YES---> [Bigger Model | Train Longer | New Model Architecture]
          |        ^
          NO        \_______
          |                 *------*
          V                         \
[Dev Set Error High?] ---YES---> [More Data | Regularization | New Model Architecture ]
          |
          NO
          |
          V
         DONE

He actually had the Dev Set change point back to the asking whether the dev set error was high, but I pointed it back to asking whether the training set error was high. Makes more sense, especially if you change the model architecture!

He showed you can do better than this work flow, and in fact it is here I learned a new trick. Basically, in addition to looking and training and dev errors, you should look at the combined set error. Hopefully this next graphic helps that make more sense:

Human Error Rate: 1%
Training Set Error: 10%        
Training-Dev Set Error:  10.1%
Dev Set Error: 10.1%
Test Set Error: 10.2%

An intuitive sense of model bias is found by comparing the human error rate to the training set error: here, we have large model bias.
Model variance can be thought of as the difference in error rate between training and training-dev (or between training and dev, as is usually done).
Ng shows Training/Dev mismatch as the difference between error on training-dev and dev…
Finally, the difference between dev and test error is a sign of overfitting the dev set

I have to give some of this more thought… But anyway, this analysis then informs Ng’s updated ML workflow:

# NOTE: oftentimes, a change made anywhere in the workflow indicates to start back
#   at the beginning (especially if an architectural change is made).

          *---------------------------------------------------------------*
          v                                                               |
[Training Error High?] ---YES---> [Bigger Model | Train Longer | New Model Architecture]
          |        ^
          NO        \_______
          |                 *------*
          V                         \
[Train-Dev Set Error High?] ---YES---> [More Training Data | Regularization | New Model Architecture]
          |
          NO
          |
          V
[Dev Set Error High?] ---YES---> [More Training and Dev Data | Data Synthesis | New Model Architecture]
          |
          NO
          |
          V
[Test Set Error High?] ---YES---> [More Dev Data]
          |
          NO
          |
          V
         DONE

Ng finds that progress in an area can be rapidly made until the human error rate is surpassed, then things get tricky. This is in part due to there being an upper bound on error rate, called the optimal error rate (or Bayes rate) – and the fact that humans are actually pretty good at many of the tasks we try to automate with ML/DL (i.e., oftentimes, the human error rate is already fairly close to the optimal rate). Also, once you surpass the human error rate you can run into some fundamental issues, e.g., are humans the ones labeling your data?

Ng brings up a toy medical example where he lists the typical human error rate as 3%, the typical doctor error rate as 1%, an expert doctor’s error rate as 0.7%, and the error rate of a team of expert doctors as 0.5%. The point is: which one should you consider the “human error rate” for this problem? Or more importantly: what type of data should you be using? Disregarding data collection costs and complications, the answer should be obvious: you want to train your model based on the team of expert doctors. This is one way to improve model performance: identify and/or insist upon high quality data.

If you are looking to make rapdid progress on something, Ng’s advice is to identify an area in which ML/DL has not yet surpassed the human error rate. This need not be something entirely new: for example, in speech recognition certain accents are in need of improvement.

How to get good? Simple: read a lot of papers, replicate results, and get comfortable with all the dirty work. In other words, get serious, set expectations consistent with reality, and put in the time!

7. Deep Reinforcement Learning (John Schulman, OpenAI)

1994: Jaakkola et al: Convergence of Stochastic Iterative Dynamic Programming Algorithms: http://papers.nips.cc/paper/764-convergence-of-stochastic-iterative-dynamic-programming-algorithms.pdf
2002: Kakade: A Natural Policy Gradient: http://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf
2003: Bagnell & Scheider: Covariant Policy Search: https://kilthub.cmu.edu/articles/Covariant_Policy_Search/6552458
2005: Riedmiller: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method: https://link.springer.com/content/pdf/10.1007/11564096_32.pdf
2007: Powell: Approximate Dynamic Programming: Solving the Curse of Dimensionality: «could not find shareable link, so include a later review paper by same author on same topic below»>
2008: Peters et al: Natural Actor Critic: http://www.cs.cmu.edu/~nickr/nips_workshop/jpeters.abstract.pdf
2009: Daume et al: Search-Based Structured Prediction: https://link.springer.com/content/pdf/10.1007%2Fs10994-009-5106-x.pdf
2009: Powell: What you should know about Approximate Dynamic Programming: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.150.1854&rep=rep1&type=pdf
2010: Jie & Abbeel: On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient: http://papers.nips.cc/paper/3922-on-a-connection-between-importance-sampling-and-the-likelihood-ratio-policy-gradient
2011: Ross et al: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning: http://proceedings.mlr.press/v15/ross11a/ross11a.pdf
2013: Mnih et al: Playing Atari with Deep Reinforcement Learning: https://arxiv.org/abs/1312.5602
2014: Guo et al: Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning: http://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tree-search-planning
2014: Mnih et al: Recurrent Models of Visual Attention: http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention
2014: Silver et al: Deterministic Policy Gradient Algorithms: http://proceedings.mlr.press/v32/silver14.pdf
2015: Hausknecht & Stone: Deep Recurrent Q-Learning for Partially Observable MDPs: https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/viewPaper/11673
2015: Heess et al: Learning Continuous Control Policies by Stochastic Value Gradients: http://papers.nips.cc/paper/5796-learning-continuous-control-policies-by-stochastic-value-gradients
2015: Ranzato et al: Sequence Level Training with Recurrent Neural Networks: https://arxiv.org/abs/1511.06732
2015: Schaul et al: Prioritized Experience Replay: https://arxiv.org/abs/1511.05952
2015: Schulman et al: High-Dimensional Continuous Control Using Generalized Advantage Estimation: https://arxiv.org/abs/1506.02438
2015: Schulman et al: Trust Region Policy Optimization: http://proceedings.mlr.press/v37/schulman15.pdf
2016: Levine et al: End-to-End Training of Deep Visuomotor Policies: http://www.jmlr.org/papers/volume17/15-522/15-522.pdf
2016: Mnih et al: Asynchronous Methods for Deep Reinforcement Learning: http://proceedings.mlr.press/v48/mniha16.pdf
2016: Silver et al: Mastering the game of Go with deep neural networks and tree search: http://web.iitd.ac.in/~sumeet/Silver16.pdf
2016: van Hasselt et al: Deep Reinforcement Learning with Double Q-Learning: https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389
2016: Wang et al: Dueling Network Architectures for Deep Reinforcement Learning: https://arxiv.org/abs/1511.06581

8. Theano Tutorial

Honestly, I already use TensorFlow/Keras, so I don’t have any interest in learning Theano at the moment – SKIP!

9. Deep Learning for Speech Recognition (Adam Coates, Baidu)

DeepSpeech Starter Code
Connectionist Temporal Classification (CTC) packages
LibriSpeech ASR Corpus
KenLM
2006: Graves et al: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
2009: Bengio et al: Curriculum Learning
2014: Hannun et al: Deep Speech: Scaling up end-to-end speech recognition
2015: Chan & Lane: Deep Recurrent Neural Networks for Acoustic Modelling
2015: Chan et al: Listen, Attend and Spell
2015: Chorowski et al: Attention-Based Models for Speech Recognition
2015: Loffe & Szegedy: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015: Panayotov et al: LibriSpeech: An ASR Corpus Based on Public Domain Audio Books
- - LibriSpeech ASR Corpus
2016: Bahdanau et al: End-to-End Attention-based Large Vocabulary Speech Recognition

11. Sequence to Sequence Deep Learning (Quoc Le, Google)

Chris Olah’s Blog: Attention and Augmented RNNs
Quoc Le’s [Tutorial http://ai.stanford.edu/~quocle/tutorial2.pdf]
Seq2Seq in TensorFlow
- See also: Neural machine translation with attention

2014: Graves et al: Neural Turing Machines
2014: Sutskever et al: Sequence to Sequence with Neural Networks
2015: Bahdanau et al: Neural Machine Translation by Jointly Learning to Align and Translate
2015: Bengio et al: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
2015: Chorowski et al: Attention-Based Models for Speech Recognition
2015: Joulin & Mikolov: Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
2015: Sordoni et al: A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
2015: Shang et al: Neural Responding Machine for Short-Text Conversation
2015: Sukhbaatar et al: End-to-End Memory Networks
2015: Vinyal & Le: A Neural Conversational Model
2015: Vinyals et al: Grammar as a Foreign Language
2015: Vinyals et al: Show and Tell: a Neural Image Caption Generator
2016: Chan et al: Listen, Attend, and Spell

12. Foundations and Challenges of Deep Learning (Yoshua Bengio)

1991: Hastad & Goldmann: On the Power of Small-Depth Threshold Circuits
1994: Bengio et al: Learning Long-Term Dependencies with Gradient Descent is Difficult
2007: Bengio & LeCun: Scaling Learning Algorithms towards AI
2009: Bengio: Learning Deep Architectures for AI
2011: Bengio & Delalleau: On the Expressive Power of Deep Architectures
2011: Delalleau & Bengio: Shallow vs. Deep Sum-Product Networks
2013: Larochelle & Hinton: Learning to combine foveal glimpses with a third-order Boltzmann machine
2013: Pascanu et al: On the number of response regions of deep feed forward networks with piece-wise linear activations
2014: Bahdanau et al: Neural Machine Translation by Jointly Learning to Align and Translate
2014: Dauphin et al: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
2014: Graves et al: Neural Turing Machines
2014: Koutnik et al: A Clockwork RNN
2014: Montufar & Morton: When Does a Mixture of Products Contain a Product of Mixtures?
2014: Montufar et al: On the Number of Linear Regions of Deep Neural Networks
2014: Pascanu et al: On the saddle point problem for non-convex optimization
2014: Weston et al: Memory Networks
2015: Bengio et al: STDP as presynaptic activity times rate of change of postsynaptic activity
2015: Bengio et al: Towards Biologically Plausible Deep Learning
2015: Choromanska et al: Open Problem: The landscape of the loss surfaces of multilayer networks
2015: Choromanska et al: The Loss Surfaces of Multilayer Networks
2015: Lee et al: Difference Target Propagation
2015: Sordoni et al: A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion
2015: Zhou et al: Object Detectors Emerge in Deep Scene CNNs
2016: Bahdanau et al: An Actor-Critic Algorithm for Sequence Prediction
2016: Bengio & Fischer: Early Inference in Energy-Based Models Approximates Back-Propagation
2016: Serban et al: Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
2017: Scellier & Bengio: Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation

Written on October 29, 2019