The Processing Analysis of Effective Recognition Quality for Arabic Speech Based on Three-Layer Neural Networks

Mohamed Hamed

doi:7420208

This paper studies the rate of recognition (RR) for the Arabic speech depending on various techniques either supervised or unsupervised learning concepts. It also, studies the accuracy of recognition rate. All networks used in the research are based on the neural network simulation as a software for supervised / non-supervised training processing is considered. Some words are selected as the most famous. The concept for recognition depends on a vital point where the segmentation of a word (utterances) has been divided into a fixed number of segments for any utterance but each one of these segments may include different numbers of frame intervals. Some words at various sounds were detected and the recognition rate is computed. The computational time is developed despite the long time required for the processing which reaches some hours at low speed processing, but this time can be reduced greatly in the practical applications with recent speedy processing systems. The appeared error during the training phase is developed and illustrated. Results prove a good recognition rate for some words and the best number of units in the hidden layer of neural networks for Arabic speech recognition is derived through either the number of sweeps in the training phase or in the actual percentage recognition results.

[1]

Kara Hawthorne, Juhani Järvikivi & Benjamin V. Tucker (2018): Finding word boundaries in Indian English-accented speech, Journal of Phonetics, Volume 66, January 2018, (145–160), http s://doi.org/10.1016/j.wocn.2017.09.008

[2]

Bronwen G. Evans & Wafaa Alshangiti (2018): The perception and production of British English vowels and consonants by Arabic learners of English, Journal of Phonetics, Volume 68, May 2018, (15-31), https://doi.org/10.1016/j.wocn.2018.01.002

[3]

T. Kohonen (1988): The neural phonetic typewriter. IEEE on Computer, Vol. 21, No. 3, (11-22).

[4]

Calbert Graham & Brechtje Post (2018): Second language acquisition of intonation: Peak alignment in American English, Journal of Phonetics, Volume 66, January 2018, (1–14), https://doi.org/10.1016/j.wocn.2017.08.002

[5]

Elizabeth K. Johnson, Amanda Seidl & Michael D. Tyler (2014): The Edge Factor in Early Word Segmentation: Utterance-Level Prosody Enables Word Form Extraction by 6-Month-Olds, https://doi.org/10.1371/journal.pone.00835464), https://doi.org/10.1016/j.wocn.2017.08.002

[6]

Marie Lallier, Reem Abu Malloih, Ahmed M. Mohammed, Batoul Khalifa, Manuel Perea & Manuel Carreiras Basque (2018): Does the Visual Attention Span Play a Role in Reading in Arabic? Scientific studies of reading Journal, Volume 22, issue 2, 2018, https://doi.org/10.1080/10888438.2017.1421958

[7]

Charles Hulme and Margaret J. Snowling (2014): The interface between spoken and written language: developmental disorders, Philos Trans R Soc Lond B Biol Sci. 2014 Jan 19; 369 (1634): 20120395. DOI: 10.1098/rstb.2012.0395. ttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866425/

[8]

A. Stolz (1993): The Sound Blaster Book. Abacus, MI, USA.

[9]

Mathias Barthel, Sebastian Sauppe, Stephen C. Levinson and Antje S. Meyer (2016): The Timing of Utterance Planning in Task-Oriented Dialogue: Evidence from a Novel List-Completion Paradigm, December 2016, https://doi.org/10.3389/fpsyg.2016.01858 https://www.frontiersin.org/articles/10.3389/fpsyg.2016.01858/full

[10]

Helen Buckler, Huiwen Goy & Elizabeth K. Johnson (2018): What infant-directed speech tells us about the development of compensation for assimilation, Journal of Phonetics, Volume 66, January 2018, (45-62), https://doi.org/10.1016/j.wocn.2017.09.004

[11]

Ling Zhong & Chang Liu (2018): Speech Perception for Native and Non-Native English Speakers: Effects of Contextual cues, The Journal of the Acoustical Society of America, Volume 143, 2018, https://doi.org/10.1121/1.5036397

[12]

Soumaya Gharsellaoui, Sid Ahmed Selouani, Wladyslaw Cichocki, Yousef Alotaibi & Adel Omar Dahmane (2018): Application of the pairwise variability index of speech rhythm with particle swarm optimization to the classification of native and non-native accents, Journal of Computer Speech & Language, Volume 48, March 2018, (67-79), https://doi.org/10.1016/j.csl.2017.10.006

[13]

Belhedi Wiem, Ben Messaoud, Mohamed anouar, Pejman Mowlaee and Bouzid Aicha (2018): Unsupervised single channel speech separation based on optimized subspace separation, Journal of Speech Communication, Volume 96, February 2018, (93-101), https://doi.org/10.1016/j.specom.2017.11.010

[14]

Kun Li, Shaoguang Mao, Xu Li, Zhiyong Wu & Helen Meng (2018): Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks, Journal of Speech Communication, Volume 96, February 2018, (28-36), https://doi.org/10.1016/j.specom.2017.11.003

[15]

R. E. Atta (1996): Arabic Speech to Text Translator. M. Sc. Thesis, Suez Canal University, Port Said, Egypt, pp. 162.

[16]

Debbie Greenstreet & John Smrstik (2017): Voice as the user interface – a new era in speech processing, May 2017 (1–9), http://www.ti.com/lit/wp/slyy116/slyy116.pdf

[17]

M. Hamed (1997): A quick neural network for computer vision of gray images. Circuits, Systems & Signal Processing Journal, USA, Vol. 16, No. 1. https://link.springer.com/content/pdf/10.1007/BF01183174.pdf

[18]

Mohamed Hamed & Dalia Wafik: A Multi-Speaker System for Arabic Speech Perception, accepted, Open Science Journal of Electrical and Electronic Engineering, Paper No. 7350160, 2018, Vol. 5, No. 2, 2018, pp. 11-17., Received: April 9, 2018; Accepted: May 4, 2018; Published: July 5, 2018, http://www.openscienceonline.com/journal/archive2?journalId=735&paperId=4309

[19]

KP Braho, JP Pike, LA Pike: US Patent 9,928,829, 2018‏: Methods and systems for identifying errors in a speech recognition system.

[20]

Tobias Hodgson, Farah Magrabi & Enrico Coiera: Evaluating the usability of speech recognition to create clinical documentation using a commercial electronic health record, International Journal of Medical Informatics, Volume 113, May 2018, Pages 38-42, https://doi.org/10.1016/j.ijmedinf.2018.02.011