Accurate Speech Recognition AI using Model of Deep Learning for Security Access

Haris Isyanto, Wahyu Ibrahim, riza samsinar, Wiwik Sudarwati

Abstract


Identity theft poses a significant threat to data privacy and online transactions in cybercrime. A voice recognition approach was created to prevent this issue in security access. Every person possesses distinct and varied voice characteristics. Speech recognition is the device's capacity to identify spoken words. This speech recognition research utilizes artificial intelligence through deep learning models that are built on the Convolutional Neural Network (CNN) algorithm. CNN can accurately process vast quantities of data. The testing yielded a training accuracy of 99.8304% and a validation accuracy of 99.4001%. Testing the keywords "Welcome" and "Hello" yielded optimal results with a 100% accuracy rate. The keyword "Hello" was tested and resulted in the fastest response time of 0.64 seconds. This project aims to enhance the accuracy and speed of speech recognition, with potential applications in banking security

Keywords: Speech recognition, deep learning, security access, accuracy, response time


Full Text:

PDF

References


A. Sholokhov, T. Kinnunen, V. Vestman, and K. A. Lee, “Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores,” Comput. Speech Lang., vol. 60, p. 101024, 2020, doi: https://doi.org/10.1016/j.csl.2019.101024.

G. Kapyshev, M. Nurtas, and A. Altaibek, “Speech recognition for Kazakh language: a research paper,” Procedia Comput. Sci., vol. 231, no. 2023, pp. 369–372, 2024, doi: 10.1016/j.procs.2023.12.219.

A. Alsobhani, H. M. A. Alabboodi, and H. Mahdi, “Speech Recognition using Convolution Deep Neural Networks,” J. Phys. Conf. Ser., vol. 1973, no. 1, 2021, doi: 10.1088/1742-6596/1973/1/012166.

A. Baevski, W.-N. Hsu, A. CONNEAU, and M. Auli, “Unsupervised Speech Recognition,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan, Eds., Curran Associates, Inc., 2021, pp. 27826–27839. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2021/file/ea159dc9788ffac311592613b7f71fbb-Paper.pdf

D. S. Park et al., “Improved noisy student training for automatic speech recognition,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2020-Octob, no. Lm, pp. 2817–2821, 2020, doi: 10.21437/Interspeech.2020-1470.

M. Malik, M. K. Malik, K. Mehmood, and I. Makhdoom, “Automatic speech recognition: a survey,” Multimed. Tools Appl., vol. 80, no. 6, pp. 9411–9457, 2021, doi: 10.1007/s11042-020-10073-7.

S. Feng, B. M. Halpern, O. Kudina, and O. Scharenborg, “Towards inclusive automatic speech recognition,” Comput. Speech Lang., vol. 84, no. March 2022, p. 101567, 2024, doi: 10.1016/j.csl.2023.101567.

S. Singh, F. Hou, and R. Wang, “Real and synthetic Punjabi speech datasets for automatic speech recognition,” Data Br., vol. 52, p. 109865, 2024, doi: 10.1016/j.dib.2023.109865.

S. Alharbi et al., “Automatic Speech Recognition: Systematic Literature Review,” IEEE Access, vol. 9, pp. 131858–131876, 2021, doi: 10.1109/ACCESS.2021.3112535.

H. Isyanto, A. S. Arifin, and M. Suryanegara, “Design and Implementation of IoT-Based Smart Home Voice Commands for disabled people using Google Assistant,” in 2020 International Conference on Smart Technology and Applications (ICoSTA), 2020, pp. 1–6. doi: 10.1109/ICoSTA48221.2020.1570613925.

H. Isyanto, A. S. Arifin, and M. Suryanegara, “Performance of Smart Personal Assistant Applications Based on Speech Recognition Technology using IoT-based Voice Commands,” in 2020 International Conference on Information and Communication Technology Convergence (ICTC), 2020, pp. 640–645. doi: 10.1109/ICTC49870.2020.9289160.

B. A. Alsaify, H. S. A. Arja, B. Y. Maayah, and M. M. Al-Taweel, “A dataset for voice-based human identity recognition,” Data Br., vol. 42, p. 108070, 2022, doi: 10.1016/j.dib.2022.108070.

M. Wang, H. Ma, Y. Wang, and X. Sun, “Design of smart home system speech emotion recognition model based on ensemble deep learning and feature fusion,” Appl. Acoust., vol. 218, no. January, p. 109886, 2024, doi: 10.1016/j.apacoust.2024.109886.

D. O’Shaughnessy, “Trends and developments in automatic speech recognition research,” Comput. Speech Lang., vol. 83, no. June 2022, p. 101538, 2023, doi: 10.1016/j.csl.2023.101538.

M. . Taye, “Theoretical Understanding of Convolutional Neural Network :,” Computation, vol. 11, 2023.

J. Wu, E. Yılmaz, M. Zhang, H. Li, and K. C. Tan, “Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition,” Front. Neurosci., vol. 14, no. March, pp. 1–14, 2020, doi: 10.3389/fnins.2020.00199.

J. Boyd, M. Fahim, and O. Olukoya, “Voice spoofing detection for multiclass attack classification using deep learning,” Mach. Learn. with Appl., vol. 14, no. August, p. 100503, 2023, doi: 10.1016/j.mlwa.2023.100503.

D. Nagajyothi and P. Siddaiah, “Speech recognition using convolutional neural networks,” Int. J. Eng. Technol., vol. 7, no. 4.6 Special Issue 6, pp. 133–137, 2018, doi: 10.14419/ijet.v7i4.6.20449.

A. M S and S. P S, “Classification of Pitch and Gender of Speakers for Forensic Speaker Recognition from Disguised Voices Using Novel Features Learned by Deep Convolutional Neural Networks,” Trait. du Signal, vol. 38, pp. 221–230, Feb. 2021, doi: 10.18280/ts.380124.

R. Shashidhar, S. Patilkulkarni, V. Ravi, H. L. Gururaj, and M. Krichen, “Audiovisual speech recognition based on a deep convolutional neural network,” Data Sci. Manag., vol. 7, no. 1, pp. 25–34, 2023, doi: 10.1016/j.dsm.2023.10.002.

H. Isyanto, A. S. Arifin, and M. Suryanegara, “Voice Biometrics for Indonesian Language Users using Algorithm of Deep Learning CNN Residual and Hybrid of DWT-MFCC Extraction Features,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 5, pp. 622–634, 2022, doi: 10.14569/IJACSA.2022.0130574.

S. T. Abate, M. Y. Tachbelie, and T. Schultz, “Deep Neural Networks Based Automatic Speech Recognition for Four Ethiopian Languages,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 8274–8278. doi: 10.1109/ICASSP40776.2020.9053883.

X. Lu, S. Li, and M. Fujimoto, “Automatic Speech Recognition,” in Speech-to-Speech Translation, Y. Kidawara, E. Sumita, and H. Kawai, Eds., Singapore: Springer Singapore, 2020, pp. 21–38. doi: 10.1007/978-981-15-0595-9_2.

H. Aldarmaki, A. Ullah, S. Ram, and N. Zaki, “Unsupervised Automatic Speech Recognition: A review,” Speech Commun., vol. 139, no. March, pp. 76–91, 2022, doi: 10.1016/j.specom.2022.02.005.

W. Ibrahim, H. Candra, and H. Isyanto, “Voice Recognition Security Reliability Analysis Using Deep Learning Convolutional Neural Network Algorithm,” J. Electr. Technol. UMY, vol. 6, no. 1, pp. 1–11, 2022, doi: 10.18196/jet.v6i1.14281.

M. M. Taye, “Theoretical understanding of convolutional neural network: concepts, architectures, applications, future directions,” Computation, vol. 11, no. 3, p. 52, 2023, doi: https://doi.org/10.3390/computation11030052.


Refbacks

  • There are currently no refbacks.
Powered by Puskom-UMJ