Analysis of Huffman Tree-Based Lossless Compression on Different Text Types

Authors

  • Lanzhi Luo
  • Qinyan Yu
  • Heng Zhang
  • Shengqi Pan

DOI:

https://doi.org/10.6919/ICJE.202512_11(12).0017

Keywords:

Huffman Tree; Lossless Compression; MATLAB Simulation.

Abstract

This paper comprehensively investigates Huffman Tree-based lossless data compression across various text types, including technical reports, news articles, and fiction narratives. A detailed theoretical model for calculating compression ratios is first introduced and subsequently validated through systematic MATLAB simulations. The results clearly indicate that character frequency distribution significantly affects compression efficiency. Specifically, fiction narrative, with its higher redundancy and repetitive linguistic patterns, achieves the best performance, yielding the highest compression ratio. Conversely, more diverse and concise texts like technical documents show less dramatic gains. These findings robustly demonstrate Huffman coding's particular effectiveness and efficiency when applied to high-redundancy datasets, reinforcing its foundational role in classical compression theory.

Downloads

Download data is not yet available.

References

[1] D. A. Lelewer and D. S. Hirschberg, "Data compression," ACM Computing Surveys (CSUR), vol. 19, no. 3, pp. 261–296, 1987.

[2] J. Chen, Y. Fang, A. Khisti, A. Özgür, and N. Shlezinger, "Information compression in the AI era: Recent advances and future challenges," IEEE Journal on Selected Areas in Communications, 2025.

[3] F. Liu et al., "Ultralow photon flux OWC links using UV‐C single‐photon detection," Laser & Photonics Reviews, p. 2401804, 2025.

[4] F. Liu et al., "Ultra-sensitive UV solar-blind optical wireless communications with an SiPM," Optics Letters, vol. 48, no. 20, pp. 5387–5390, 2023/10/15 2023, doi: 10.1364/OL.503235.

[5] F. Liu et al., "10 Mbit/s UV Solar-Blind OWC at 30 Photons Per Bit," in 2024 Conference on Lasers and Electro-Optics (CLEO), 5–10 May 2024 2024, pp. 1–2.

[6] D. A. Huffman, "A method for the construction of minimum-redundancy codes," Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 2007.

[7] L. A. Fitriya, T. W. Purboyo, and A. L. Prasasti, "A review of data compression techniques," International Journal of Applied Engineering Research, vol. 12, no. 19, pp. 8956–8963, 2017.

[8] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, "Introduction to Algorithms (3-rd edition)," MIT Press and McGraw-Hill, 2009.

[9] D. Salomon and G. Motta, Handbook of data compression. Springer Science & Business Media, 2010.

[10] C. E. Shannon, "A mathematical theory of communication," The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948, doi: 10.1002/j.1538-7305.1948.tb01338.x.

Downloads

Published

2025-12-21

Issue

Section

Articles

How to Cite

Luo, L., Yu, Q., Zhang, H., & Pan, S. (2025). Analysis of Huffman Tree-Based Lossless Compression on Different Text Types. International Core Journal of Engineering, 11(12), 162-166. https://doi.org/10.6919/ICJE.202512_11(12).0017