LLMs: The Internet, But Make It a JPEG

Recently, I was reflecting on Large Language Models (LLMs) and how they contain all or much of the knowledge of the internet in a much smaller storeage space. For example, the ChatGPT 4 model is about 570 GB, a size almost anyone could store on their own computer’s storage. Whereas the internet is enormous–and growing by an astonishing 400 million TB each day.

It got me thinking–could LLMs be thought of as sophisticated, lossy compression algorithms like JPEGs? After all, they distill enormous amounts of data, like the entire internet, into a much small form while preserving what matters most about it.

The Fundamentals of Data Compression

To appreciate the analogy, it helps to review the basics of data compression, which I’ve outlined below:

Lossless Compression

Lossless algorithms (e.g., ZIP, PNG) reduce file size by eliminating redundancy without sacrificing any original data. It ensures that every bit of the source information can be recovered exactly during decompression.

Lossy Compression

Lossy methods (e.g., JPEG, MP3) achieve higher compression ratios by approximating and discarding data deemed less critical. While some details are lost, the result is a significantly smaller file that retains the essential content.

In both cases, the goal is to capture the “essence” of the original information, though through different trade-offs between accuracy and space savings.

When Compression Is “Just Right”

It’s a common concern that lossy compression might omit important details. However, in practice, efficient lossy techniques are designed to keep critical information intact.

For example, the most well-known lossy compression algorithm, JPEG, maintains the appearance of a photograph by leveraging shortcomings of human visual perception. It discards information that the human eye is less sensitive to, such as subtle color variations and fine details, while preserving more critical elements like brightness and contrast. By transforming the image into the frequency domain using a process called Discrete Cosine Transform (DCT), JPEG identifies and removes high-frequency components that contribute less to the overall visual experience. This selective reduction allows for significant file size savings without noticeably compromising image quality, demonstrating how lossy compression can effectively retain essential information.​

LLMs do something similar with the information on the internet. Consider the example of those immutable physics equations E = MC², F = MA, and V = IR. Though the LLM is much, much smaller than the portion of the internet it was trained on, these key pieces of information are preserved and can be retrieved accurately because they represent the fundamental structure of physical laws.

These formulas are analogous to the “core content” in language data that LLMs compress. Even though the process might discard myriad less essential details, the model retains these vital building blocks—proving that the compression is both efficient and sufficiently precise for practical applications.

How LLMs Mirror Lossy Compression

While an LLM won’t be able to tell you everyone who has ever written about the physics equations above, or every nuance of what has been said about them, the formulas themselves are important enough that they are stored in the model and “understood” by it. This same things applies to billions of other facts as well, all stored in a file the size about four 8k feature films. Incredible when you think about it that way!

Large Language Models work by learning the underlying patterns in language. In doing so, they essentially perform a form of lossy compression with the following characteristics:

  • Selective Retention:
    LLMs prioritize retaining patterns that define the structure and meaning of language, much like a JPEG image compression discards minor color details while preserving image clarity.
  • Generalization Over Memorization:
    Instead of storing every word or sentence, LLMs internalize the “rules” of language. The fact that they can reproduce core factual elements, such as our physics equations, shows that the process focuses on what’s important.
  • Balancing Loss and Utility:
    Just as well-tuned lossy compression ensures that an image still looks good even after data reduction, LLMs strike a balance between compressing data and preserving the factual and semantic integrity of the content.

Evaluating the Effectiveness of Compression

In traditional compression algorithms, engineers evaluate performance using key metrics such as compression ratio, fidelity and reconstruction quality, and speed and efficiency.

  • Compression Ratio:
    Determines the reduction in size from raw data to its compressed form. LLMs knock this one out of the park, compressing much of the internet into a very manageable file size.
  • Fidelity and Reconstruction Quality:
    In traditional lossy systems, metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) gauge quality. For LLMs, we consider how accurately the model reproduces core facts, like the scientific formulas I referenced, as evidence that critical information is not lost.
  • Speed and Efficiency:
    Both LLMs and conventional compression must meet performance standards in real-time applications. Currently the training of an LLM is VERY, VERY time-consuming and expensive in both energy and processing power. In this one area, LLMs aren’t as efficient as other compression algorithms, but since LLMs are written infrequently and read frequently, this is not a huge problem at the moment.

The fact that a model can output accurate equations like E = MC², F = MA, and V = IR confirms that despite operating with a “lossy” approach, the compression is intelligently crafted to preserve the essence of factual knowledge.

Conclusion

As I’ve shown, LLMs demonstrate a compelling parallel to lossy compression algorithms. While they streamline vast amounts of data, they ingeniously preserve essential details. This observation may not change much except to give you a different perspective on how these models work and what they can accomplish.

Picture of J. Tower

J. Tower

Jonathan, or J. as he's known to friends, is a husband, father, and founding partner of Trailhead Technology Partners, a custom software consulting company with employees across the U.S., Europe, and South America. He is a 12-time recipient of the Microsoft MVP award for his work with .NET, a frequent speaker at software conferences around the world, and was recently elected to the .NET Foundation Board for the 2026–2027 term. He doesn’t mind the travel, though, as it allows him to share what he's been learning and also gives him the chance to visit beautiful places like national parks—one of his passions. So far, he's visited 58 of the 63 U.S. national parks. J. is also passionate about building the software community. Over the years, he has served on several non-profit boards, including more than a decade as president of the board for Beer City Code, Western Michigan's largest professional software conference. Outside of work, J. enjoys hiking, reading, photography, and watching all the Best Picture nominees before the Oscars ceremony each year.

Free Consultation

Sign up for a FREE consultation with one of Trailhead's experts.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Related Blog Posts

We hope you’ve found this to be helpful and are walking away with some new, useful insights. If you want to learn more, here are a couple of related articles that others also usually find to be interesting:

Our Gear Is Packed and We're Excited to Explore With You

Ready to come with us? 

Together, we can map your company’s software journey and start down the right trails. If you’re set to take the first step, simply fill out our contact form. We’ll be in touch quickly – and you’ll have a partner who is ready to help your company take the next step on its software journey. 

We can’t wait to hear from you! 

Main Contact

This field is for validation purposes and should be left unchanged.

Together, we can map your company’s tech journey and start down the trails. If you’re set to take the first step, simply fill out the form below. We’ll be in touch – and you’ll have a partner who cares about you and your company. 

We can’t wait to hear from you! 

Montage Portal

Montage Furniture Services provides furniture protection plans and claims processing services to a wide selection of furniture retailers and consumers.

Project Background

Montage was looking to build a new web portal for both Retailers and Consumers, which would integrate with Dynamics CRM and other legacy systems. The portal needed to be multi tenant and support branding and configuration for different Retailers. Trailhead architected the new Montage Platform, including the Portal and all of it’s back end integrations, did the UI/UX and then delivered the new system, along with enhancements to DevOps and processes.

Logistics

We’ve logged countless miles exploring the tech world. In doing so, we gained the experience that enables us to deliver your unique software and systems architecture needs. Our team of seasoned tech vets can provide you with:

Custom App and Software Development

We collaborate with you throughout the entire process because your customized tech should fit your needs, not just those of other clients.

Cloud and Mobile Applications

The modern world demands versatile technology, and this is exactly what your mobile and cloud-based apps will give you.

User Experience and Interface (UX/UI) Design

We want your end users to have optimal experiences with tech that is highly intuitive and responsive.

DevOps

This combination of Agile software development and IT operations provides you with high-quality software at reduced cost, time, and risk.

Trailhead stepped into a challenging project – building our new web architecture and redeveloping our portals at the same time the business was migrating from a legacy system to our new CRM solution. They were able to not only significantly improve our web development architecture but our development and deployment processes as well as the functionality and performance of our portals. The feedback from customers has been overwhelmingly positive. Trailhead has proven themselves to be a valuable partner.

– BOB DOERKSEN, Vice President of Technology Services
at Montage Furniture Services

Technologies Used

When you hit the trails, it is essential to bring appropriate gear. The same holds true for your digital technology needs. That’s why Trailhead builds custom solutions on trusted platforms like .NET, Angular, React, and Xamarin.

Expertise

We partner with businesses who need intuitive custom software, responsive mobile applications, and advanced cloud technologies. And our extensive experience in the tech field allows us to help you map out the right path for all your digital technology needs.

  • Project Management
  • Architecture
  • Web App Development
  • Cloud Development
  • DevOps
  • Process Improvements
  • Legacy System Integration
  • UI Design
  • Manual QA
  • Back end/API/Database development

We partner with businesses who need intuitive custom software, responsive mobile applications, and advanced cloud technologies. And our extensive experience in the tech field allows us to help you map out the right path for all your digital technology needs.

Our Gear Is Packed and We're Excited to Explore with You

Ready to come with us? 

Together, we can map your company’s tech journey and start down the trails. If you’re set to take the first step, simply fill out the contact form. We’ll be in touch – and you’ll have a partner who cares about you and your company. 

We can’t wait to hear from you! 

Thank you for reaching out.

You’ll be getting an email from our team shortly. If you need immediate assistance, please call (616) 371-1037.