A Multi-Speaker System for Arabic Speech Perception

Mohamed Hamed; Dalia Wafik

doi:7350160

This paper presents a new generalized system for the Arabic speech recognition based on 3-layer integrated neural networks consisting of control network and several sub-networks. The segmentation technique is accounted to minimize the selected features, and consequently, the computational time effort. The back propagation and the linear predictive coding bases are considered in the tested multi-speaker system. The proposed concept is applied for words representing the digits. The recognition rate is quite accurate, and so it is recommended to be used in the software suitable for Arabic speech recognition. The multi-speaker signals are tested and trained for recognition. They were classified into three groups as male, female and children groups. The effect of number of nodes in the hidden layer of the neural network on the rate of recognition is investigated.

[1]

Helen Buckler, Huiwen Goy & Elizabeth K. Johnson: What infant-directed speech tells us about the development of compensation for assimilation, Journal of Phonetics, Volume 66, January 2018, (45-62), https://doi.org/10.1016/j.wocn.2017.09.004

[2]

Marie Lallier, Reem Abu Malloih, Ahmed M. Mohammed, Batoul Khalifa, Manuel Perea & Manuel Carreiras Basque: Does the Visual Attention Span Play a Role in Reading in Arabic? Scientific studies of reading Journal, Volume 22, issue 2, 2018, https://doi.org/10.1080/10888438.2017.1421958

[3]

Ling Zhong & Chang Liu: Speech Perception for Native and Non-Native English Speakers: Effects of Contextual cues, The Journal of the Acoustical Society of America, 143, 2018, https://doi.org/10.1121/1.5036397

[4]

Kun Li, Shaoguang Mao, Xu Li, Zhiyong Wu & Helen Meng: Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks, Journal of Speech Communication, Volume 96, February 2018, (28-36), https://doi.org/10.1016/j.specom.2017.11.003

[5]

P N Kulkarni, P C Pandey & D S Jangamashetti: Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss, J. Speech Communication, Volume 54, Issue 3, March 2012 (341 - 350). https://doi.org/10.1016/j.specom.2011.09.005

[6]

G Hinton et al: Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Processing Magazine, ISSN: 1053 - 5888, DOI: 10.1109/MSP.2012.2205597, Vol. 29, Issue. 6, Nov. 2012 (82-97)

[7]

Jeff Dalton, Atul Deshmane, “An approach to increasing machine intelligence”, IEEE Potentials, Artificial Neural Networks, pp. 33 - 36, April 1991

[8]

Buddhamas Kriengwatana, Paola Escudero, Anne H. Kerkhoven, and Carel ten Cate: A general auditory bias for handling speaker variability in speech, Evidence in humans and songbirds, Front Psychol. Journal 2015; 6: 1243. Published online 2015 Aug 25. doi: 10.3389/fpsyg.2015.01243

[9]

C H Cai, D Ke, Y Xu & K Su: Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks, Computer Science, Artificial Intelligence, arXiv:1704.07503v1, 25 Apr 2017 (1-8)

[10]

M. Hamed, F. W. Zaki & R. Atta: Arabic speech recognition using neural networks, at 1/1/2000, 2000. http://www.askzad.com/

[11]

Soumaya Gharsellaoui, Sid Ahmed Selouani, Wladyslaw Cichocki, Yousef Alotaibi & Adel Omar Dahmane: Application of the pairwise variability index of speech rhythm with particle swarm optimization to the classification of native and non-native accents, Journal of Computer Speech & Language, Volume 48, March 2018, (67 - 79), https://doi.org/10.1016/j.csl.2017.10.006

[12]

M. Hamed & A. El Desouky: Effect of learning rate on the recognition of images. Journal of Active and Passive Electronic Components, (ISSN: 0882 - 7516, ESSN: 1563 – 5031, Romeo: GREEN, USA, Volume 19 (1996), Issue 1, Pages 1 - 12. http://dx.doi.org/10.1155/1996/45086, https://www.hindawi.com/journals/apec/1996/045086/abs/

[13]

M. Hamed, I. Amin, A. Tolba, I. El Nahry & H. Nour El Din: A new method of color image classification using parallel connected stripes. Port Said Scientific Engineering Journal Port Said, Egypt, Vol. 6, (1994) 15 – 27. https://www.researchgate.net/.../Mohamed_Hamed8/

[14]

M. Hamed, A. Tolba & H. El Hendy: A general technique for the recognition of planner shapes. Journal of El Mansoura University, El Mansoura, Egypt, Vol. 18, (1993) 49 - 60. https://www.researchgate.net/profile/Mohamed_Hamed8/, http://www.askzad.com/

[15]

Belhedi Wiem, Ben Messaoud, Mohamed anouar, Pejman Mowlaee and Bouzid Aicha: Unsupervised single channel speech separation based on optimized subspace separation, Journal of Speech Communication, Volume 96, February 2018, (93 - 101), https://doi.org/10.1016/j.specom.2017.11.010

[16]

Zhang Long, Xu Xu, Chen Huang, Chen Jiaxu and Ye Zhongfu: Supervised single-channel speech dereverberation and denoising using a two-stage model based sparse representation, Journal of Speech Communication, Volume 97, March 2018, (1 - 8), https://doi.org/10.1016/j.specom.2017.12.012