RS-JointNet: A Lightweight Framework for Procedural Knowledge Extraction from Remote Sensing Documents
DOI:
https://doi.org/10.6919/ICJE.202604_12(4).0005Keywords:
Natural Language Processing; Knowledge Extraction; Large Language Models; Lightweight Network; Knowledge Graph.Abstract
Valuable procedural knowledge in the remote sensing (RS) domain-such as algorithm selection and data processing workflows-is frequently trapped within unstructured academic PDFs and text documents. Extracting this information using traditional Natural Language Processing (NLP) techniques is notoriously difficult due to dense terminology, nested entities, and overlapping semantic relations. Furthermore, deploying Large Language Models (LLMs) for direct online extraction introduces unacceptable computational overhead and hallucination risks. To address these issues, this paper presents a novel, end-to-end NLP framework explicitly designed to mine and structure process-oriented knowledge from raw PDFs and TXT files. The proposed methodology initiates with a dynamic sliding-window chunking strategy to effectively parse lengthy texts. Subsequently, an LLM-guided distillation module leverages multi-level consistency verification to automatically generate a high-fidelity training corpus. This corpus supervises RS-JointNet, a customized lightweight extraction network that combines a domain-adapted SciBERT encoder with a two-dimensional grid tagging decoder. This architecture successfully transforms sequence labeling into matrix classification, adeptly resolving nested and overlapping structures. Ultimately, the extracted triples are instantiated into a Neo4j graph database. Experimental results indicate that our framework achieves an outstanding F1-score of 88.7%. Compared to baseline LLMs, it accelerates inference speed by a factor of 37 while drastically reducing memory consumption, offering a highly accurate and cost-effective NLP pipeline for constructing computable RS knowledge graphs.
Downloads
References
[1] L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin and B.A. Johnson: Deep learning in remote sensing: A comprehensive review and list of resources, IEEE Geoscience and Remote Sensing Magazine, Vol. 7 (2019) No. 2, p. 67-105.
[2] S. Ji, S. Pan, E. Cambria, P. Marttinen and P.S. Yu: A survey on knowledge graphs: Representation, acquisition, and applications, IEEE Transactions on Neural Networks and Learning Systems, Vol. 33 (2021) No. 2, p. 494-514.
[3] K. Lo, L.L. Wang, M. Neumann, R. Kinney and D.S. Weld: S2ORC: The Semantic Scholar Open Research Corpus, Proc. 58th Annual Meeting of the Association for Computational Linguistics (Online, July 5-10, 2020), p. 4969-4983.
[4] J. Yu, B. Bohnet and M. Poesio: Named entity recognition as dependency parsing, Proc. 58th Annual Meeting of the Association for Computational Linguistics (Online, July 5-10, 2020), p. 6470-6480.
[5] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami and C. Dyer: Neural architectures for named entity recognition, Proc. 2016 Conference of the North American Chapter of the Association for Computational Linguistics (San Diego, California, June 12-17, 2016), p. 260-270..
[6] T. Brown, B. Mann, N. Ryder, M. Subbiah, J.D. Kaplan and P. Dhariwal: Language models are few-shot learners, Proc. 34th Conference on Neural Information Processing Systems (Vancouver, Canada, December 6-12, 2020). Vol. 33, p. 1877-1901.
[7] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su and S. Yan: Survey of hallucination in natural language generation, ACM Computing Surveys, Vol. 55 (2023) No. 12, p. 1-38.
[8] Y. Shang, H. Huang, X. Mao, T. Sun and W. Wei: TPLinker: Single-stage joint extraction of entities and relations through token pair linking, Proc. 28th International Conference on Computational Linguistics (Barcelona, Spain, December 8-13, 2020), p. 1574-1585.
[9] I. Beltagy, K. Lo and A. Cohan: SciBERT: A pretrained language model for scientific text, Proc. 2019 Conference on Empirical Methods in Natural Language Processing (Hong Kong, China, November 3-7, 2019), p. 3615-3620.
[10] Y. Sun, S. Zhang, C. Zhang, Y. Wang and S. Zhang: Knowledge graph construction for remote sensing based on multi-source data, IEEE Access, Vol. 7 (2019), p. 124311-124324.
[11] J. Yu: Research on Information Extraction and Knowledge Graph Construction in Scientific Domain (Ph.D., Zhejiang University, China 2021). p. 45-52.
[12] C.D. Manning, P. Raghavan and H. Schütze: Introduction to Information Retrieval (Cambridge University Press, UK 2008), p. 100-115.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Core Journal of Engineering

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




