Deep Learning for Acoustic Modelling

acousticmodelling

This blog post has an overview papers related to acoustic modelling primarily for speech recognition but also speech generation (synthesis). See also ai.amundtveit.com/keyword/acoustic for a broader set of (at the time of writing 73) recent Deep Learning papers related to acoustics for speech recognition and other applications of acoustics.

Acoustic Modelling is described in Wikipedia as: “An acoustic model is used in Automatic Speech Recognition to represent the relationship between an audio signal and the phonemes or other linguistic units that make up speech. The model is learned from a set of audio recordings and their corresponding transcripts”. 

Blog Post Illustration Photo Source: Professor Mark Gales‘ (University of Cambridge) 2009 presentation Acoustic Modelling for Speech Recognition: Hidden Markov Models and Beyond?

Best regards,

Amund Tveit

Year  Title Author
2017   Investigation on acoustic modeling with different phoneme set for continuous Lhasa Tibetan recognition based on DNN method  H Wang, K Khyuru, J Li, G Li, J Dang, L Huang
2017   Personalized Acoustic Modeling By Weakly Supervised Multi-Task Deep Learning Using Acoustic Tokens  CK Wei, CT Chung, HY Lee, LS Lee
2017   I-vector estimation as auxiliary task for multi-task learning based acoustic modeling for automatic speech recognition  G Pironkov, S Dupont, T Dutoit
2016   Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition  Y Liu
2016   A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition  A Zeyer, P Doetsch, P Voigtlaender, R Schlüter, H Ney
2016   Improvements in IITG Assamese Spoken Query System: Background Noise Suppression and Alternate Acoustic Modeling  S Shahnawazuddin, D Thotappa, A Dey, S Imani
2016   DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi  I Kipyatkova, A Karpov
2015   Doubly Hierarchical Dirichlet Process Hmm For Acoustic Modeling  AHHN Torbati, J Picone
2015   Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends  ZH Ling, SY Kang, H Zen, A Senior, M Schuster
2015   Acoustic Modeling In Statistical Parametric Speech Synthesis–From Hmm To Lstm-Rnn  H Zen
2015   Acoustic Modeling of Bangla Words using Deep Belief Network  M Ahmed, PC Shill, K Islam, MAH Akhand
2015   Unified Acoustic Modeling using Deep Conditional Random Fields  Y Hifny
2015   Exploiting Low-Dimensional Structures To Enhance Dnn Based Acoustic Modeling In Speech Recognition  P Dighe, G Luyet, A Asaei, H Bourlard
2015   Ensemble Acoustic Modeling for CD-DNN-HMM Using Random Forests of Phonetic Decision Trees  T Zhao, Y Zhao, X Chen
2015   Deep Neural Networks for Acoustic Modeling  V from Embeds, G Hinton, L Deng, D Yu, G Dahl
2015   Integrating Articulatory Data in Deep Neural Network-based Acoustic Modeling  L Badino, C Canevari, L Fadiga, G Metta
2015   Deep learning in acoustic modeling for Automatic Speech Recognition and Understanding-an overview  I Gavat, D Militaru
Continue Reading

Deep Learning with Long Short-Term Memory (LSTM)

This blog post has some recent papers about Deep Learning with Long-Short Term Memory (LSTM). To get started I recommend checking out Christopher Olah’s Understanding LSTM Networks and Andrej Karpathy’s The Unreasonable Effectiveness of Recurrent Neural Networks. This blog post is complemented by Deep Learning with Recurrent/Recursive Neural Networks (RNN) – ICLR 2017 Discoveries.

Best regards,
Amund Tveit

Year  Title Author
2016   Look, Listen and Learn-A Multimodal LSTM for Speaker Identification  J Ren, Y Hu, YW Tai, C Wang, L Xu, W Sun, Q Yan
2016   Leveraging Sentence-level Information with Encoder LSTM for Natural Language Understanding  G Kurata, B Xiang, B Zhou, M Yu
2016   Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition  FJ Ordóñez, D Roggen
2016   Exploiting LSTM structure in deep neural networks for speech recognition  T He, J Droppo
2016   A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition  A Zeyer, P Doetsch, P Voigtlaender, R Schlüter, H Ney
2016   Geometric Scene Parsing with Hierarchical LSTM  Z Peng, R Zhang, X Liang, L Lin
2016   LSTM Networks for Mobile Human Activity Recognition  Y Chen, K Zhong, J Zhang, Q Sun, X Zhao
2016   Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention  Y Liu, C Sun, L Lin, X Wang
2016   Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks  Z Zhang, F Ringeval, J Han, J Deng, E Marchi
2016   Contextual LSTM (CLSTM) models for Large scale NLP tasks  S Ghosh, O Vinyals, B Strope, S Roy, T Dean, L Heck
2016   Bidirectional LSTM Networks Employing Stacked Bottleneck Features for Expressive Speech-Driven Head Motion Synthesis  K Haag, H Shimodaira
2016   Beyond Frame-level CNN: Saliency-aware 3D CNN with LSTM for Video Action Recognition  J Song, H Shen
2015   Learning Statistical Scripts with LSTM Recurrent Neural Networks  K Pichotta, RJ Mooney
2015   A deep bidirectional LSTM approach for video-realistic talking head  B Fan, L Xie, S Yang, L Wang, FK Soong
2015   Maxout neurons for deep convolutional and LSTM neural networks in speech recognition  M Cai, J Liu
2015   Scene Analysis by Mid-level Attribute Learning using 2D LSTM networks and an Application to Web-image Tagging  W Byeon, M Liwicki, TM Breuel
2015   Learning to Diagnose with LSTM Recurrent Neural Networks  ZC Lipton, DC Kale, C Elkan, R Wetzell
2015   Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting  SHI Xingjian, Z Chen, H Wang, DY Yeung, W Wong
Continue Reading