Technical Program Outline

Sunday Dec 13 Monday Dec 14 Tuesday Dec 15 Wednesday Dec 16 Thursday Dec 17
8:00 Opening 8:10 - 8:30 Announcements - 8:20 - 8:30 Announcements - 8:20 - 8:30 Announcements - 8:20 - 8:30
8:30 Keynote 1
Rich Caruana - Microsoft Research
Keynote 2
Hermann Ney - RWTH Aachen University
Keynote 3
Jason Eisner - Johns Hopkins University
Keynote 4
Jerry Chen - NVIDIA
9:00
9:30 Coffee Break
9:30 - 10:00
Coffee Break
9:30 - 10:00
Coffee Break
9:30 - 10:00
Coffee Break
9:30 - 10:00
10:00 Invited Talk 1
Oriol Vinyals - Google Brain
10:00 - 11:00
ASRU Challenges Talks

10:00 - 11:00
Invited Talk 4
Dan Bohus - Microsoft Research
10:00 - 11:00
Invited Talk 6
Steve Renals - University of Edinburgh
10:00 - 11:00
10:30
11:00 Poster Session - M1
Automatic Speech Recognition I
(ASR I)
Poster Session - M2
Text-to-speech
Poster Session - T1
Automatic Speech recognition In Reverberant Environments (ASpIRE)
Poster Session - T2
3rd CHiME Speech Separation and Recognition Challenge
Demo Session - W1
Demonstrations
Poster Session - W2
Spoken Dialog Systems
(SDS)
Poster Session - R1
Robustness in automatic speech recognition, speech-to-speech translation, and spontaneous speech processing
 
Poster Session - R2
Spoken Language Understanding
(SLU)
11:30
12:00
12:30 Lunch
12:30 - 14:00
Lunch
12:30 - 14:00
Lunch
12:30 - 14:00
Closing Remarks
13:00
13:30
14:00 Invited Talk 2
Heiga Zen - Google
14:00 - 15:00
Invited Talk 3
Tomohiro Nakatani - NTT Corporation
14:00 - 15:00
Invited Talk 5
Kai Yu - Shanghai Jiao Tong University
14:00 - 15:00
14:30
15:00 Coffee Break
15:00 - 15:30
Coffee Break
15:00 - 15:30
Coffee Break
15:00 - 15:30
15:30 Poster Session - M3
Automatic Speech Recognition II
(ASR II)
Poster Session - M4
Spoken Document Retrieval, Speech Summarization, and Applications
Poster Session - T3
Automatic Speech Recognition III
(ASR III)
Poster Session - T4
The MGB Challenge - Recognition of Multi-Genre Broadcast Data
Sponsorship Session - S1
15:30 - 16:00
16:00 Early dismissal to travel to the banquet location
16:30 Registration
16:30 - 19:00
17:00 Networking & hosted wine tasting at FireSky Networking & hosted wine tasting at FireSky Banquet
17:30 Panel Session - P1
Lessons learned from organizing and running the challenges presented at ASRU
17:30 - 18:30
18:00
18:30 Reception
18:30 - 21:00
19:00
19:30
20:00
20:30
21:00
21:30

Detailed Technical Program

Note: Paper R1.11: USING BIDIRECTIONAL LSTM RECURRENT NEURAL NETWORKS TO LEARN HIGH-LEVEL ABSTRACTIONS OF SEQUENTIAL FEATURES FOR AUTOMATED SCORING OF NON-NATIVE SPONTANEOUS SPEECH will be presented during the M4 poster session on Monday, rather than the R1 poster session on Thursday.

 
Monday, December 14
11:00 - 12:30

M1: Automatic Speech Recognition I

Earth + Air + Fire
M1.1: DIFFERENT WORD REPRESENTATIONS AND THEIR COMBINATION FOR PROPER NAME RETRIEVAL FROM DIACHRONIC DOCUMENTS
Irina Illina, Dominique Fohr, LORIA-INRIA, France
 
M1.2: SPARSE NON-NEGATIVE MATRIX LANGUAGE MODELING FOR GEO-ANNOTATED QUERY SESSION DATA
Ciprian Chelba, Noam Shazeer, Google Inc, United States
 
M1.3: TRAINING DATA PSEUDO-SHUFFLING AND DIRECT DECODING FRAMEWORK FOR RECURRENT NEURAL NETWORK BASED ACOUSTIC MODELING
Naoyuki Kanda, Mitsuyoshi Tachimori, Xugang Lu, Hisashi Kawai, National Institute of Information and Communications Technology, Japan
 
M1.4: ON CONSTRUCTING AND ANALYSING AN INTERPRETABLE BRAIN MODEL FOR THE DNN BASED ON HIDDEN ACTIVITY PATTERNS
Khe Chai Sim, National University of Singapore, Singapore
 
M1.5: SPEAKER LOCATION AND MICROPHONE SPACING INVARIANT ACOUSTIC MODELING FROM RAW MULTICHANNEL WAVEFORMS
Tara Sainath, Ron Weiss, Kevin Wilson, Arun Narayanan, Michiel Bacchiani, Andrew Senior, Google Inc, United States
 
M1.6: HYBRID DNN-LATENT STRUCTURED SVM ACOUSTIC MODELS FOR CONTINUOUS SPEECH RECOGNITION
Suman Ravuri, International Computer Science Institute; University of California - Berkeley, United States
 
M1.7: DISCRIMINATIVE TRAINING OF CONTEXT-DEPENDENT LANGUAGE MODEL SCALING FACTORS AND INTERPOLATION WEIGHTS
Shuangyu Chang, Abhik Lahiri, Issac Alphonso, Barlas Oguz, Michael Levit, Microsoft Corporation, United States; Benoit Dumoulin, Facebook Inc., United States
 
M1.8: ACOUSTIC MODEL TRAINING BASED ON NODE-WISE WEIGHT BOUNDARY MODEL INCREASING SPEED OF DISCRETE NEURAL NETWORKS
Ryu Takeda, Kazunori Komatani, Osaka University, Japan; Kazuhiro Nakadai, Honda Research Institute Japan Co., Ltd., Japan
 
M1.9: TWO-STAGE ASGD FRAMEWORK FOR PARALLEL TRAINING OF DNN ACOUSTIC MODELS USING ETHERNET
Zhichao Wang, Xingyu Na, Xin Li, Jielin Pan, Yonghong Yan, Institute of Acoustics, Chinese Academy of Sciences, China
 
M1.10: RNNDROP: A NOVEL DROPOUT FOR RNNS IN ASR
Taesup Moon, Heeyoul Choi, Hoshik Lee, Inchul Song, Samsung Advanced Institute of Technology, Republic of Korea
 
M1.11: SPECTRAL LEARNING WITH NON NEGATIVE PROBABILITIES FOR FINITE STATE AUTOMATON
Hadrien Glaude, Thales Airborne Systems / University Lille 1, France; Cyrille Enderli, Thales Airborne Systems, France; Olivier Pietquin, University Lille 1, France
 
M1.12: DEEP BI-DIRECTIONAL RECURRENT NETWORKS OVER SPECTRAL WINDOWS
Abdel-Rahman Mohamed, Frank Seide, Dong Yu, Jasha Droppo, Andreas Stolcke, Geoffrey Zweig, Microsoft, United States; Gerald Penn, University of Toronto, Canada
 
M1.13: PERSONALIZING UNIVERSAL RECURRENT NEURAL NETWORK LANGUAGE MODEL WITH USER CHARACTERISTIC FEATURES BY SOCIAL NETWORK CROWDSOURCING
Bo-Hsiang Tseng, Hung-yi Lee, Lin-Shan Lee, National Taiwan University, Taiwan
 
M1.14: TIME DELAY DEEP NEURAL NETWORK-BASED UNIVERSAL BACKGROUND MODELS FOR SPEAKER RECOGNITION
David Snyder, Daniel Garcia-Romero, Daniel Povey, The Johns Hopkins University, United States
 
 
Monday, December 14
11:00 - 12:30

M2: Text-to-speech systems

Earth + Air + Fire
M2.1: AUTOMATIC PROSODY PREDICTION FOR CHINESE SPEECH SYNTHESIS USING BLSTM-RNN AND EMBEDDING FEATURES
Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu, Northwestern Polytechnical University, China
 
M2.2: NATURALNESS AND RAPPORT IN A PITCH ADAPTIVE LEARNING COMPANION
Nichola Lubold, Arizona State University, United States; Heather Pon-Barry, Mount Holyoke College, United States; Erin Walker, Arizona State University, United States
 
M2.3: LEARNING CONTINUOUS REPRESENTATION OF TEXT FOR PHONE DURATION MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
Sai Krishna Rallabandi, Sai Sirisha Rallabandi, Padmini Bandi, Suryakanth Gangashetty, International Institute of Information Technology- Hyderabad, India
 
M2.4: SPEAKER INTONATION ADAPTATION FOR TRANSFORMING TEXT-TO-SPEECH SYNTHESIS SPEAKER IDENTITY
Mahsa Sadat Elyasi Langarani, Jan van Santen, Oregon Health and Science University, United States
 
 
Monday, December 14
15:30 - 17:00

M3: Automatic Speech Recognition II

Earth + Air + Fire
M3.1: INVESTIGATING SPARSE DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
Gueorgui Pironkov, Stéphane Dupont, Thierry Dutoit, University of Mons, Belgium
 
M3.2: LATENT DIRICHLET ALLOCATION BASED ORGANISATION OF BROADCAST MEDIA ARCHIVES FOR DEEP NEURAL NETWORK ADAPTATION
Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain, University of Sheffield, United Kingdom
 
M3.3: TOWARDS STRUCTURED DEEP NEURAL NETWORK FOR AUTOMATIC SPEECH RECOGNITION
Yi-Hsiu Liao, Graduate Institute of Electronic Engineering, National Taiwan University, Taiwan; Hung-Yi Lee, Graduate Institute of Electrical Engineering, National Taiwan University, Taiwan; Lin-shan Lee, Graduate Institute of Electronic Engineering, National Taiwan University, Taiwan
 
M3.4: LEARNING FACTORIZED FEATURE TRANSFORMS FOR SPEAKER NORMALIZATION
Lahiru Samarakoon, Khe Chai SIM, National University of Singapore, Singapore
 
M3.5: IMPROVING DATA SELECTION FOR LOW-RESOURCE STT AND KWS
Thiago Fraga da Silva, Antoine Laurent, Vocapia Research, France; Jean-Luc Gauvain, Lori Lamel, CNRS-LIMSI, France; Viet Bac Le, Abdel Messaoudi, Vocapia Research, France
 
M3.6: STRUCTURED DISCRIMINATIVE MODELS USING DEEP NEURAL-NETWORK FEATURES
Rogier van Dalen, Jingzhou Yang, Haipeng Wang, Anton Ragni, Chao Zhang, Mark J. F. Gales, University of Cambridge, United Kingdom
 
M3.7: EESEN: END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING
Yajie Miao, Mohammad Gowayyed, Florian Metze, Carnegie Mellon University, United States
 
M3.8: STOCHASTIC GRADIENT VARIATIONAL BAYES FOR DEEP LEARNING-BASED ASR
Andros Tjandra, Universitas Indonesia, Indonesia; Sakriani Sakti, Satoshi Nakamura, Nara Institute of Science and Technology, Japan; Mirna Adriani, Universitas Indonesia, Indonesia
 
M3.9: INVESTIGATION OF BACK-OFF BASED INTERPOLATION BETWEEN RECURRENT NEURAL NETWORK AND N-GRAM LANGUAGE MODELS
Xie Chen, Xunying Liu, Mark J. F. Gales, Philip C. Woodland, Cambridge University, United Kingdom
 
M3.10: LSTM TIME AND FREQUENCY RECURRENCE FOR AUTOMATIC SPEECH RECOGNITION
Jinyu Li, Abdel-Rahman Mohamed, Geoffrey Zweig, Yifan Gong, Microsoft, United States
 
 
Monday, December 14
15:30 - 17:00

M4: Spoken Document Retrieval, Speech Summarization, and Applications

Earth + Air + Fire
M4.1: INCORPORATING USER FEEDBACK TO RE-RANK KEYWORD SEARCH RESULTS
Scott Novotney, Kevin Jett, Owen Kimball, Raytheon BBN Technologies, United States
 
M4.2: COMBINATION OF SYLLABLE BASED N-GRAM SEARCH AND WORD SEARCH FOR SPOKEN TERM DETECTION THROUGH SPOKEN QUERIES AND IV/OOV CLASSIFICATION
Nagisa Sakamoto, Kazumasa Yamamoto, Seiichi Nakagawa, Toyohashi University of Technology, Japan
 
M4.3: INCORPORATING PARAGRAPH EMBEDDINGS AND DENSITY PEAKS CLUSTERING FOR SPOKEN DOCUMENT SUMMARIZATION
Kuan-Yu Chen, Academia Sinica, Taiwan; Kai-Wun Shih, National Taiwan Normal University, Taiwan; Shih-Hung Liu, Academia Sinica, Taiwan; Berlin Chen, National Taiwan Normal University, Taiwan; Hsin-Min Wang, Academia Sinica, Taiwan
 
M4.4: HIGH-PERFORMANCE SWAHILI KEYWORD SEARCH WITH VERY LIMITED LANGUAGE PACK: THE THUEE SYSTEM FOR THE OPENKWS15 EVALUATION
Meng Cai, Zhiqiang Lv, Cheng Lu, Jian Kang, Like Hui, Zhuo Zhang, Jia Liu, Tsinghua University, China
 
M4.5: PHONETIC UNIT SELECTION FOR CROSS-LINGUAL QUERY-BY-EXAMPLE SPOKEN TERM DETECTION
Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo, Universidade de Vigo, Spain
 
M4.6: IMPROVED SYSTEM FUSION FOR KEYWORD SEARCH
Zhiqiang Lv, Meng Cai, Cheng Lu, Jian Kang, Like Hui, Wei-Qiang Zhang, Jia Liu, Tsinghua University, China
 
M4.7: DEEP MULTIMODAL SEMANTIC EMBEDDINGS FOR SPEECH AND IMAGES
David Harwath, James Glass, Massachusetts Institute of Technology, United States
 
M4.8: AN ITERATIVE DEEP LEARNING FRAMEWORK FOR UNSUPERVISED DISCOVERY OF SPEECH FEATURES AND LINGUISTIC UNITS WITH APPLICATIONS ON SPOKEN TERM DETECTION
Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Chia-Hsiang Liu, Hung-yi Lee, Lin-shan Lee, National Taiwan University, Taiwan
 
M4.9: INCREMENTAL SENTENCE COMPRESSION USING LSTM RECURRENT NETWORKS
Sakriani Sakti, Nara Institute of Science and Technology, Japan; Faiz Ilham, Bandung Institute of Technology, Indonesia; Graham Neubig, Tomoki Toda, Nara Institute of Science and Technology, Japan; Ayu Purwarianti, Bandung Institute of Technology, Indonesia; Satoshi Nakamura, Nara Institute of Science and Technology, Japan
 
M4.10: MULTILINGUAL REPRESENTATIONS FOR LOW RESOURCE SPEECH RECOGNITION AND KEYWORD SEARCH
Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran, Abhinav Sethy, Kartik Audhkhasi, Xiaodong Cui, Ellen Kislal, Lidia Mangu, Markus Nussbaum-Thom, Michael Picheny, IBM T.J. Watson, United States; Zoltán Tüske, Pavel Golik, Ralf Schlüter, Hermann Ney, RWTH Aachen University, Germany; Mark J. F. Gales, Kate M. Knill, Anton Ragni, Haipeng Wang, Philip C. Woodland, Cambridge University, United Kingdom
 
 
Tuesday, December 15
11:00 - 12:30

T1: Automatic Speech recognition In Reverberant Environments (ASpIRE)

Earth + Air + Fire
T1.1: ANALYSIS OF FACTORS AFFECTING SYSTEM PERFORMANCE IN THE ASPIRE CHALLENGE
Jennifer Melot, Nicolas Malyska, Jessica Ray, Wade Shen, MIT Lincoln Laboratory, United States
 
T1.2: SINGLE AND MULTI-CHANNEL APPROACHES FOR DISTANT SPEECH RECOGNITION UNDER NOISY REVERBERANT CONDITIONS: I2R'S SYSTEM DESCRIPTION FOR THE ASPIRE CHALLENGE
Jonathan Dennis, Huy Dat Tran, Institute For Infocomm Research, Singapore
 
T1.3: IMPROVING ROBUSTNESS AGAINST REVERBERATION FOR AUTOMATIC SPEECH RECOGNITION
Vikramjit Mitra, Julien Van Hout, Wen Wang, Martin Graciarena, Mitchell McLaren, Horacio Franco, Dimitra Vergyri, SRI International, United States
 
T1.4: ROBUST SPEECH RECOGNITION IN UNKNOWN REVERBERANT AND NOISY CONDITIONS
Roger Hsiao, Jeff Ma, William Hartmann, Raytheon BBN Technologies, United States; Martin Karafiat, Frantisek Grezl, Lukas Burget, Igor Szoke, Jan Honza Cernocky, Brno University of Technology, Czech Republic; Shinji Watanabe, Zhuo Chen, Mitsubishi Electric Research Laboratories, United States; Sri Harish Mallidi, Hynek Hermansky, Johns Hopkins University, United States; Stavros Tsakalidis, Richard Schwartz, Raytheon BBN Technologies, United States
 
T1.5: JHU ASPIRE SYSTEM : ROBUST LVCSR WITH TDNNS, IVECTOR ADAPTATION AND RNN-LMS
Vijayaditya Peddinti, Guoguo Chen, Vimal Manohar, Johns Hopkins University, United States; Tom Ko, Huawei, China; Daniel Povey, Sanjeev Khudanpur, Johns Hopkins University, United States
 
T1.6: THE AUTOMATIC SPEECH RECOGITION IN REVERBERANT ENVIRONMENTS (ASPIRE) CHALLENGE
Mary Harper, IARPA, United States
 
 
Tuesday, December 15
11:00 - 12:30

T2: 3rd CHiME Speech Separation and Recognition Challenge

Earth + Air + Fire
T2.1: ADAPTIVE BEAMFORMING AND ADAPTIVE TRAINING OF DNN ACOUSTIC MODELS FOR ENHANCED MULTICHANNEL NOISY SPEECH RECOGNITION
Alexey Prudnikov, Speech Technology Center Inc., Russian Federation; Maxim Korenevsky, Sergei Aleinik, ITMO University, Russian Federation
 
T2.2: BOOSTED ACOUSTIC MODEL LEARNING AND HYPOTHESES RESCORING ON THE CHIME-3 TASK
Shahab Jalalvand, University of Trento, Italy; Daniele Falavigna, Marco Matassoni, Piergiorgio Svaizer, Maurizio Omologo, Fondazione Bruno Kessler, Italy
 
T2.3: UNIFIED ASR SYSTEM USING LGM-BASED SOURCE SEPARATION, NOISE-ROBUST FEATURE EXTRACTION, AND WORD HYPOTHESIS SELECTION
Yusuke Fujita, Ryoichi Takashima, Takeshi Homma, Rintaro Ikeshita, Yohei Kawaguchi, Takashi Sumiyoshi, Takashi Endo, Masahito Togami, Hitachi, Ltd, Japan
 
T2.4: SPEECH ENHANCEMENT USING BEAMFORMING AND NON NEGATIVE MATRIX FACTORIZATION FOR ROBUST SPEECH RECOGNITION IN THE CHIME-3 CHALLENGE
Thanh T. Vu, Benjamin Bigot, Eng Siong Chng, Nanyang Technological University, Singapore
 
T2.5: AN INFORMATION FUSION APPROACH TO RECOGNIZING MICROPHONE ARRAY SPEECH IN THE CHIME-3 CHALLENGE BASED ON A DEEP LEARNING FRAMEWORK
Jun Du, Qing Wang, Yan-Hui Tu, Xiao Bao, Li-Rong Dai, University of Science and Technology of China, China; Chin-Hui Lee, Georgia Institute of Technology, United States
 
T2.6: THE NTT CHIME-3 SYSTEM: ADVANCES IN SPEECH ENHANCEMENT AND RECOGNITION FOR MOBILE MULTI-MICROPHONE DEVICES
Takuya Yoshioka, Nobutaka Ito, Marc Delcroix, Atsunori Ogawa, Keisuke Kinoshita, Masakiyo Fujimoto, NTT Corporation, Japan; Chengzhu Yu, The University of Texas at Dallas, United States; Wojciech Fabian, Miquel Espi, Takuya Higuchi, Shoko Araki, Tomohiro Nakatani, NTT Corporation, Japan
 
T2.7: BLSTM SUPPORTED GEV BEAMFORMER FRONT-END FOR THE 3RD CHIME CHALLENGE
Jahn Heymann, Lukas Drude, Aleksej Chinaev, Reinhold Haeb-Umbach, University of Paderborn, Germany
 
T2.8: MULTI-CHANNEL SPEECH PROCESSING ARCHITECTURES FOR NOISE ROBUST SPEECH RECOGNITION: 3RD CHIME CHALLENGE RESULTS
Lukas Pfeifenberger, Tobias Schrank, Matthias Zöhrer, Martin Hagmüller, Franz Pernkopf, Graz University of Technology, Austria
 
T2.9: ROBUST SPEECH RECOGNITION USING BEAMFORMING WITH ADAPTIVE MICROPHONE GAINS AND MULTICHANNEL NOISE REDUCTION
Shengkui Zhao, Advanced Digital Sciences Center, Singapore; Xiong Xiao, Nanyang Technological University, Singapore; Zhaofeng Zhang, Nagaoka University of Technology, Japan; Thi Ngoc Tho Nguyen, Advanced Digital Sciences Center, Singapore; Xionghu Zhong, Nanyang Technological University, Singapore; Bo Ren, Longbiao Wang, Nagaoka University of Technology, Japan; Douglas L. Jones, Advanced Digital Sciences Center, Singapore; Eng Siong Chng, Nanyang Technological University, Singapore; Haizhou Li, Institute For Infocomm Research, Singapore
 
T2.10: A CHIME-3 CHALLENGE SYSTEM: LONG-TERM ACOUSTIC FEATURES FOR NOISE ROBUST AUTOMATIC SPEECH RECOGNITION
Niko Moritz, Stephan Gerlach, Fraunhofer IDMT, Project Group for Hearing, Speech, and Audio Technology, Germany; Kamil Adiloglu, Hörtech gGmbH, Germany; Jörn Anemüller, Birger Kollmeier, University of Oldenburg, Germany; Stefan Goetze, Fraunhofer IDMT, Project Group for Hearing, Speech, and Audio Technology, Germany
 
T2.11: THE MERL/SRI SYSTEM FOR THE 3RD CHIME CHALLENGE USING BEAMFORMING, ROBUST FEATURE EXTRACTION, AND ADVANCED SPEECH RECOGNITION
Takaaki Hori, Mitsubishi Electric Research Laboratories, United States; Zhuo Chen, Columbia University, United States; Hakan Erdogan, Sabanci University, Turkey; John Hershey, Jonathan Le Roux, Mitsubishi Electric Research Laboratories, United States; Vikramjit Mitra, SRI International, United States; Shinji Watanabe, Mitsubishi Electric Research Laboratories, United States
 
T2.12: ROBUST ASR USING NEURAL NETWORK BASED SPEECH ENHANCEMENT AND FEATURE SIMULATION
Sunit Sivasankaran, Aditya Arie Nugraha, Emmanuel Vincent, Juan A. Morales-Cordovilla, Siddharth Dalmia, Irina Illina, Antoine Liutkus, INRIA, France
 
T2.13: EXPLOITING SYNCHRONY SPECTRA AND DEEP NEURAL NETWORKS FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
Ning Ma, Ricard Marxer, Jon Barker, Guy J. Brown, University of Sheffield, United Kingdom
 
T2.14: COMBINING SPECTRAL FEATURE MAPPING AND MULTI-CHANNEL MODEL-BASED SOURCE SEPARATION FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
Deblin Bagchi, Michael Mandel, Zhongqiu Wang, Yanzhang He, Andrew Plummer, Eric Fosler-Lussier, The Ohio State University, United States
 
T2.15: THE THIRD `CHIME' SPEECH SEPARATION AND RECOGNITION CHALLENGE: DATASET, TASK AND BASELINES
Jon Barker, Ricard Marxer, University of Sheffield, United Kingdom; Emmanuel Vincent, INRIA, France; Shinji Watanabe, Mitsubishi Electric Research Laboratories, United States
 
 
Tuesday, December 15
15:30 - 17:00

T3: Automatic Speech Recognition III

Earth + Air + Fire
T3.1: DEEP BOTTLENECK FEATURES FOR I-VECTOR BASED TEXT-INDEPENDENT SPEAKER VERIFICATION
Sina Hamidi Ghalehjegh, Richard C. Rose, McGill University, Canada
 
T3.2: DISCRIMINATIVE SEGMENTAL CASCADES FOR FEATURE-RICH PHONE RECOGNITION
Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu, Toyota Technological Institute at Chicago, United States
 
T3.3: HILBERT SPECTRAL ANALYSIS OF VOWELS USING INTRINSIC MODE FUNCTIONS
Steven Sandoval, Arizona State University, United States; Phillip L. De Leon, New Mexico State University, United States; Julie Liss, Arizona State University, United States
 
T3.4: MULTI-REFERENCE WER FOR EVALUATING ASR FOR LANGUAGES WITH NO ORTHOGRAPHIC RULES
Ahmed Ali, Walid Magdy, Qatar Computing Research Institute, Qatar; Steve Renals, Peter Bell, University of Edinburgh, United Kingdom
 
T3.5: ACOUSTIC MODELING WITH NEURAL GRAPH EMBEDDINGS
Yuzong Liu, Katrin Kirchhoff, University of Washington, United States
 
T3.6: MULTITASK LEARNING AND SYSTEM COMBINATION FOR AUTOMATIC SPEECH RECOGNITION
Olivier Siohan, David Rybach, Google Inc, United States
 
T3.7: SPEAKER ADAPTIVE JOINT TRAINING OF GAUSSIAN MIXTURE MODELS AND BOTTLENECK FEATURES
Zoltán Tüske, Pavel Golik, Ralf Schlüter, Hermann Ney, RWTH Aachen University, Germany
 
T3.8: ACOUSTIC MODELLING WITH CD-CTC-SMBR LSTM RNNS
Andrew Senior, Hasim Sak, Felix de Chaumont Quitry, Tara Sainath, Kanishka Rao, Google Inc, United States
 
T3.9: AUTOMATION OF SYSTEM BUILDING FOR STATE-OF-THE-ART LARGE VOCABULARY SPEECH RECOGNITION USING EVOLUTION STRATEGY
Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Tokyo Institute of Technology, Japan; Shinji Watanabe, Mitsubishi Electric Research Laboratories, United States; Kevin Duh, Nara Institute of Science and Technology, Japan
 
T3.10: IMPROVING THE INTERPRETABILITY OF DEEP NEURAL NETWORKS WITH STIMULATED LEARNING
Shawn Tan, Khe Chai Sim, National University of Singapore, Singapore; Mark J. F. Gales, University of Cambridge, United Kingdom
 
 
Tuesday, December 15
15:30 - 17:00

T4: The MGB Challenge - Recognition of Multi-Genre Broadcast Data

Earth + Air + Fire
T4.1: THE 2015 SHEFFIELD SYSTEM FOR TRANSCRIPTION OF MULTI-GENRE BROADCAST MEDIA
Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner, Raymond W. M. Ng, Madina Hasan, Yulan Liu, Thomas Hain, University of Sheffield, United Kingdom
 
T4.2: THE 2015 SHEFFIELD SYSTEM FOR LONGITUDINAL DIARISATION OF BROADCAST MEDIA
Rosanna Milner, Oscar Saz, Salil Deena, Mortaza Doulaty, Raymond W. M. Ng, Thomas Hain, University of Sheffield, United Kingdom
 
T4.3: CAMBRIDGE UNIVERSITY TRANSCRIPTION SYSTEMS FOR THE MULTI-GENRE BROADCAST CHALLENGE
Philip C. Woodland, Xunying Liu, Yanmin Qian, Chao Zhang, Mark J. F. Gales, Penny Karanasou, Pierre Lanchantin, Linlin Wang, University of Cambridge, United Kingdom
 
T4.4: THE DEVELOPMENT OF THE CAMBRIDGE UNIVERSITY ALIGNMENT SYSTEMS FOR THE MULTI-GENRE BROADCAST CHALLENGE
Pierre Lanchantin, Mark J. F. Gales, Penny Karanasou, Xunying Liu, Yanmin Qian, Linlin Wang, Philip C. Woodland, Chao Zhang, University of Cambridge, United Kingdom
 
T4.5: THE NAIST ASR SYSTEM FOR THE 2015 MULTI-GENRE BROADCAST CHALLENGE: ON COMBINATION OF DEEP LEARNING SYSTEMS USING A RANK-SCORE FUNCTION
Quoc Truong Do, Michael Heck, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura, Nara Institute of Science and Technology, Japan
 
T4.6: SPEAKER DIARISATION AND LONGITUDINAL LINKING IN MULTI-GENRE BROADCAST DATA
Penny Karanasou, Mark J. F. Gales, Pierre Lanchantin, Xunying Liu, Yanmin Qian, Linlin Wang, Philip C. Woodland, Chao Zhang, University of Cambridge, United Kingdom
 
T4.7: VARIATIONAL BAYESIAN PLDA FOR SPEAKER DIARIZATION IN THE MGB CHALLENGE
Jesus Villalba, Alfonso Ortega, Antonio Miguel, Eduardo Lleida, Universidad de Zaragoza, Spain
 
T4.8: A SYSTEM FOR AUTOMATIC ALIGNMENT OF BROADCAST MEDIA CAPTIONS USING WEIGHTED FINITE-STATE TRANSDUCERS
Peter Bell, Steve Renals, University of Edinburgh, United Kingdom
 
T4.9: CRIM AND LIUM APPROACHES FOR MULTI-GENRE BROADCAST MEDIA TRANSCRIPTION
Vishwa Gupta, Centre de Recherche Informatique de Montreal (CRIM), Canada; Paul Deleglise, LIUM - University of Le Mans, France; Gilles Boulianne, Centre de Recherche Informatique de Montreal (CRIM), Canada; Yannick Esteve, Sylvain Meignier, Anthony Rousseau, LIUM - University of Le Mans, France
 
T4.10: THE MGB CHALLENGE: EVALUATING MULTI-GENRE BROADCAST MEDIA RECOGNITION
Peter Bell, University of Edinburgh, United Kingdom; Mark J. F. Gales, University of Cambridge, United Kingdom; Thomas Hain, University of Sheffield, United Kingdom; Jonathan Kilgour, University of Edinburgh, United Kingdom; Pierre Lanchantin, Xunying Liu, University of Cambridge, United Kingdom; Andrew McParland, BBC, United Kingdom; Steve Renals, University of Edinburgh, United Kingdom; Oscar Saz, University of Sheffield, United Kingdom; Mirjam Wester, University of Edinburgh, United Kingdom; Philip C. Woodland, University of Cambridge, United Kingdom
 
 
Wednesday, December 16
11:00 - 12:30

W2: Spoken Dialog Systems

Earth + Air + Fire
W2.1: INCREMENTAL LSTM-BASED DIALOG STATE TRACKER
Lukas Zilka, Filip Jurcicek, Charles University in Prague, Czech Republic
 
W2.2: MULTI-DOMAIN DIALOGUE SUCCESS CLASSIFIERS FOR POLICY TRAINING
David Vandyke, Pei-Hao Su, Milica Gasic, Nikola Mrksic, Tsung-Hsien Wen, Steve Young, University of Cambridge, United Kingdom
 
W2.3: OPEN-DOMAIN PERSONALIZED DIALOG SYSTEM USING USER-INTERESTED TOPICS IN SYSTEM RESPONSES
Jeesoo Bang, Sangdo Han, Kyusong Lee, Gary Geunbae Lee, Pohang University of Science and Technology, Republic of Korea
 
W2.4: A STUDY OF SOCIAL-AFFECTIVE COMMUNICATION: AUTOMATIC PREDICTION OF EMOTION TRIGGERS AND RESPONSES IN TELEVISION TALK SHOWS
Nurul Lubis, Sakriani Sakti, Graham Neubig, Koichiro Yoshino, Tomoki Toda, Satoshi Nakamura, Nara Institute of Science and Technology, Japan
 
W2.5: ADAPTIVE SELECTION FROM MULTIPLE RESPONSE CANDIDATES IN EXAMPLE-BASED DIALOGUE
Masahiro Mizukami, Graduate School of Information Science, Nara Institute of Science and Technology, Japan; Hideaki Kizuki, Toshio Nomura, SHARP Corporation, Japan; Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura, Graduate School of Information Science, Nara Institute of Science and Technology, Japan
 
W2.6: OPTIMIZING HUMAN-INTERPRETABLE DIALOG MANAGEMENT POLICY USING GENETIC ALGORITHM
Hang Ren, Weiqun Xu, Yonghong Yan, Institute of Acoustics, Chinese Academy of Sciences, China
 
W2.7: IMPLEMENTATION OF GENERIC POSITIVE-NEGATIVE TRACKER IN EXTENSIBLE DIALOG SYSTEM
Sangjun Koo, Seonghan Ryu, Gary Geunbae Lee, Pohang University of Science and Technology, Republic of Korea
 
W2.8: POLICY COMMITTEE FOR ADAPTATION IN MULTI-DOMAIN SPOKEN DIALOGUE SYSTEMS
Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, Steve Young, University of Cambridge, United Kingdom
 
W2.9: APPLYING DEEP LEARNING TO ANSWER SELECTION: A STUDY AND AN OPEN TASK
Minwei Feng, Bing Xiang, Michael Glass, Lidan Wang, Bowen Zhou, IBM T.J. Watson, United States
 
 
Thursday, December 17
11:00 - 12:30

R1: Robustness in automatic speech recognition, speech-to-speech translation, and spontaneous speech processing

Earth + Air + Fire
R1.1: SPOKEN LANGUAGE TRANSLATION GRAPHS RE-DECODING USING AUTOMATIC QUALITY ASSESSMENT
Laurent Besacier, Benjamin Lecouteux, LIG - Univ. Grenoble Alpes, France; Ngoc-Quang Luong, IDIAP - Switzerland, Switzerland; Le Ngoc-Tien, LIG - Univ. Grenoble Alpes, France
 
R1.2: THE DIRHA-ENGLISH CORPUS AND RELATED TASKS FOR DISTANT-SPEECH RECOGNITION IN DOMESTIC ENVIRONMENTS
Mirco Ravanelli, Luca Cristoforetti, Roberto Gretter, Marco Pellin, Alessandro Sosi, Maurizio Omologo, Fondazione Bruno Kessler, Italy
 
R1.3: UNCERTAINTY ESTIMATION OF DNN CLASSIFIERS
Sri Harish Mallidi, The Johns Hopkins University, United States; Tetsuji Ogawa, Waseda University, Japan; Hynek Hermansky, The Johns Hopkins University, United States
 
R1.4: TOWARDS UTTERANCE-BASED NEURAL NETWORK ADAPTATION IN ACOUSTIC MODELING
Ivan Himawan, Petr Motlicek, Marc Ferras Font, Srikanth Madikeri, Idiap Research Institute, Switzerland
 
R1.5: PHONETICALLY-ORIENTED WORD ERROR ALIGNMENT FOR SPEECH RECOGNITION ERROR ANALYSIS IN SPEECH TRANSLATION
Nicholas Ruiz, Marcello Federico, Fondazione Bruno Kessler, Italy
 
R1.6: UTTERANCE CLASSIFICATION IN SPEECH-TO-SPEECH TRANSLATION FOR ZERO-RESOURCE LANGUAGES IN THE HOSPITAL ADMINISTRATION DOMAIN
Lara J. Martin, Andrew Wilkinson, Sai Sumanth Miryala, Vivian Robison, Alan W. Black, Carnegie Mellon University, United States
 
R1.7: MULTI-TASK JOINT-LEARNING OF DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Yanmin Qian, Maofan Yin, Yongbin You, Kai Yu, Shanghai Jiao Tong University, China
 
R1.8: TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Vikramjit Mitra, Horacio Franco, SRI International, United States
 
R1.9: NAME-AWARE LANGUAGE MODEL ADAPTATION AND SPARSE FEATURES FOR STATISTICAL MACHINE TRANSLATION
Wen Wang, SRI International, United States; Haibo Li, Nuance, United States; Heng Ji, Rensselaer Polytechnic Institute, United States
 
R1.10: AN I-VECTOR PLDA BASED GENDER IDENTIFICATION APPROACH FOR SEVERELY DISTORTED AND MULTILINGUAL DARPA RATS DATA
Shivesh Ranjan, Gang Liu, John H. L. Hansen, The University of Texas at Dallas, United States
 
R1.11: USING BIDIRECTIONAL LSTM RECURRENT NEURAL NETWORKS TO LEARN HIGH-LEVEL ABSTRACTIONS OF SEQUENTIAL FEATURES FOR AUTOMATED SCORING OF NON-NATIVE SPONTANEOUS SPEECH
Zhou Yu, Carnegie Mellon University, United States; Vikram Ramanarayanan, David Suendermann-Oeft, Xinhao Wang, Klaus Zechner, Lei Chen, Jidong Tao, Alexei V. Ivanou, Yao Qian, Educational Testing Service, United States
 
 
Thursday, December 17
11:00 - 12:30

R2: Spoken Language Understanding

Earth + Air + Fire
R2.1: TOPIC-SPACE BASED SETUP OF A NEURAL NETWORK FOR THEME IDENTIFICATION OF HIGHLY IMPERFECT TRANSCRIPTIONS
Mohamed Morchid, Richard Dufour, Georges Linarès, LIA - University of Avignon, France
 
R2.2: SEMI-SUPERVISED SLOT TAGGING IN SPOKEN LANGUAGE UNDERSTANDING USING RECURRENT TRANSDUCTIVE SUPPORT VECTOR MACHINES
Yangyang Shi, Microsoft, China; Kaisheng Yao, Microsoft Research, United States; Hu Chen, Yi-Cheng Pan, Mei-Yuh Hwang, Microsoft, China
 
R2.3: A UNIVERSAL MODEL FOR FLEXIBLE ITEM SELECTION IN CONVERSATIONAL DIALOGS
Asli Celikyilmaz, Zhaleh Feizollahi, Dilek Hakkani-Tur, Ruhi Sarikaya, Microsoft, United States
 
R2.4: A COMPARATIVE STUDY OF NEURAL NETWORK MODELS FOR LEXICAL INTENT CLASSIFICATION
Suman Ravuri, International Computer Science Institute; University of California - Berkeley, United States; Andreas Stolcke, Microsoft Research; International Computer Science Institute, United States
 
R2.5: DETECTING ACTIONABLE ITEMS IN MEETINGS BY CONVOLUTIONAL DEEP STRUCTURED SEMANTIC MODELS
Yun-Nung Chen, Carnegie Mellon University, United States; Dilek Hakkani-Tur, Xiaodong He, Microsoft Research, United States
 
R2.6: MULTIMODAL EMBEDDING FUSION FOR ROBUST SPEAKER ROLE RECOGNITION IN VIDEO BROADCAST
Mickael Rouvier, Sebastien Delecraz, Benoit Favre, Meriem Bendris, Frederic Béchet, Aix-Marseille Université, France
 
R2.7: RECENT IMPROVEMENTS TO NEUROCRFS FOR NAMED ENTITY RECOGNITION
Marc-Antoine Rondeau, McGill University, Canada; Yi Su, Nuance Communications, Inc., Canada
 
R2.8: NATURAL LANGUAGE UNDERSTANDING FOR PARTIAL QUERIES
Xiaohu Liu, Asli Celikyilmaz, Ruhi Sarikaya, Microsoft, United States