CRNN Enhancement Architecture Integrating Linear Deformable Convolution and Multi-head Attention Mechanism

Xing Chen

doi:10.6919/ICJE.202506_11(6).0042

Authors

Xing Chen

DOI:

https://doi.org/10.6919/ICJE.202506_11(6).0042

Keywords:

Handwritten Text; Text Recognition; Convolutional Recurrent Neural Network; Linear Variable Convolution; Multi-Head Attention Mechanism.

Abstract

This paper studies the challenges faced by the recognition of handwritten text on work orders in manufacturing factories, especially in complex scenarios where handwritten text is dense and connected in strokes in industrial work orders, making it difficult to achieve the desired effect. In view of the particularity of the application scenarios in this paper, this paper proposes the Enhanced Convolutional Recurrent Neural Network (MCRNN). In the MCRNN architecture, the linear variable convolution (LD-Conv) and the multi-head attention mechanism (MHA) are integrated, effectively enhancing the modeling ability of the model for local variations of fonts and time-dependent features. Based on this enhanced architecture, handwritten data is trained to construct a more adaptable handwritten text recognition model. In addition, compared with other advanced methods, our method shows better text recognition performance and improves the recognition accuracy of handwritten text.

Downloads

Download data is not yet available.

References

[1] Nguyen T T H, Jatowt A, Coustaty M, et al. Survey of post-OCR processing approaches[J]. ACM Computing Surveys (CSUR), 2021, 54(6): 1-37.

[2] Liao M, Wan Z, Yao C, et al. Real-time scene text detection with differentiable binarization[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 11474-11481.

[3] Mutlag W K, Ali S K, Aydam Z M, et al. Feature extraction methods: a review[C]//Journal of Physics: Conference Series. IOP Publishing, 2020, 1591(1): 012028.

[4] Ahlawat S, Rishi R. A genetic algorithm based feature selection for handwritten digit recognition[J]. Recent Patents on Computer Science, 2019, 12(4): 304-316.

[5] Tatar G. Design Aspects of Machine Learning Algorithms for the Hardware Implementation of Advanced Driver Assistance Systems (A/DAS)[D]. Marmara Universitesi (Turkey), 2024.

[6] Wang, Z., et al. Linear deformable convolution with dynamic kernel scaling for efficient vision transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence[J], 2024, 46(3), 1452-1466.

[7] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

[8] Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(11): 2298-2304.

[9] Chen T, Xu R, He Y, et al. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN[J]. Expert Systems with Applications, 2017, 72: 221-230.

[10] Tsunoo E, Futami H, Kashiwagi Y, et al. Decoder-only architecture for speech recognition with ctc prompts and text data augmentation[J]. arxiv preprint arxiv:2309.08876, 2023.

[11] Levinson J, Esteves C, Chen K, et al. An analysis of svd for deep rotation estimation[J]. Advances in Neural Information Processing Systems, 2020, 33: 22554-22565.

[12] Edwards D R, Handsley M M, Pennington C J. The ADAM metalloproteinases[J]. Molecular aspects of medicine, 2008, 29(5): 258-289.

[13] Wang Z R. Integrating Canonical Neural Units and Multi-Scale Training for Handwritten Text Recognition[J]. arxiv preprint arxiv:2410.18374, 2024.