Skip to main content
Tags:
  • Generative AI
  • Artificial Intelligence
  • API Integration
  • Prompt Engineering

Navigating the LLM Landscape: A Journey of Growth and Exploration

Posted On: 17 June, 2025
By Aneesh Nathani

Subscribe for Updates 

Sign up now for exclusive access to our informative resource center, with industry news and expert analysis.

Agree to the Privacy Policy.

Introduction

Following the breakthrough in GenAI with ChatGPT, a plethora of foundational LLMs, including both closed and open models, have emerged. Hugging Face boasts an extensive array of open LLMs. Amid the abundance of closed and open LLM options, various other techniques have surfaced, making it challenging to determine the right path and option. These techniques can lead to confusion when deciding whether to use closed LLM models as-is (via APIs), self-host open LLM models, set up quantized models, or fine-tune models. The ultimate choice of approach depends on the accuracy required for a given use case. However, this blog post aims to provide a structured framework for a progressive journey towards developing solutions or building internal organizational capabilities. This progression starts from simple approaches and gradually moves toward more complex ones.

We have divided this blog into a two-part series: LLM experiments without fine-tuning and with fine-tuning. In this blog, we will concentrate on the progressive journey of LLM experiments without involving any fine-tuning. Please note that this blog focuses on embarking on a progressive journey through various approaches and options for using and hosting LLMs rather than selecting the right LLM.

 

Prompting Progression Strategy

Prompting techniques are widely known as an effective way to elicit creative and concise responses from LLMs. Prompt engineering has also evolved, and here, we present a commonly discussed progression strategy from an experimentation perspective to achieve better and more accurate results.

Image
Prompting progression strategy, illustrating the iterative approach to refining prompts for improving the accuracy and relevance of Gen AI responses

You can find more prompting techniques here.

 

Progressive Journey for LLM Experiments

Just as there is a progression strategy for prompting to improve results, there are various techniques for using, experimenting with, and hosting LLMs. The following diagram outlines a structured approach for embarking on a progressive journey to experiment with different LLM usage methods. All the approaches mentioned below start from simplicity and gradually move towards complexity, considering ease-of-use, cost, and resource considerations.

Image
Progression journey for LLM experiments, showcasing the process of refining and optimizing large language models to enhance their performance and reliability

1. Accessing Closed LLMs via External APIs

This is the most straightforward way to harness the power of LLMs. Closed LLMs can easily integrate into applications with minimal effort and optimized costs. It is the simplest method to create quick proofs of concept or even for production to expedite application development.

There are various good-quality closed LLMs available from different providers. Azure OpenAI and OpenAI offer powerful LLMs, such as the GPT series models (GPT-3.5-turbo and GPT-4). Anthropic’s Claude, Google’s PaLM, Cohere, Jurrasic, BloombergGPT, and others represent a handful of closed LLMs that are third-party hosted and can be interacted with via APIs. The number of tokens used in the input and generated as part of the output determines the pricing for most of these models.

These APIs typically have different context window variants, so it’s crucial to consider these when designing a solution for a specific use case. For example, GPT-4 offers two context window variants: 8K and 32K tokens.

This option provides a cost-effective and less risky approach compared to other options. It also offers the flexibility to change the underlying model by simply updating the API implementation.

2. Self-Hosted Quantized Open LLMs

Quantization, in simple terms, involves compressing model weights to reduce the model’s size and memory footprint. Quantization converts a 32/16-bit model into lower precision, such as 8-bit or 4-bit, significantly reducing the memory footprint of the model. An 8-bit model requires four times less memory, making it more affordable. By reducing the model’s size, memory requirements, and computational demands without significantly sacrificing model accuracy, quantized models become more accessible to users who may not have access to high-end GPUs like Nvidia A100.

Quantized models can be loaded and executed on consumer-grade CPUs or GPUs. You can deploy many quantized models available out-of-the-box with an inference endpoint on a regular-grade CPU or GPU for inference.

A few examples include GPT4All, GPTQ, ollama, HuggingFace, and more, which offer quantized models available for direct download and use in inference or for setting up inference endpoints.

This option is cost and resource-efficient, as already quantized models require minimal CPU/GPU and RAM requirements to self-host an inference endpoint. However, achieving optimal inference endpoint performance requires a superior design and better resource planning.

Note: You must evaluate these quantized models for accuracy for the given use cases before any production usage. Also, only certain types of hardware may support these models and quantized techniques, so you should verify the hardware requirements before utilizing these models.

3. Cloud-Hosted Open LLMs

Open LLMs are significant and demand-intensive hardware requirements (CPU/GPU/high RAM). The RAM size required for setting up an inference endpoint varies depending on the number of parameters in LLM models (e.g., 7 billion, 13 billion, 65 billion). The bigger the model, the higher the GPU/RAM requirement.

You can set up these models either on-premises or on-cloud. Most cloud providers, such as Azure, AWS, and GCP, offer infrastructure and libraries for open LLMs, enabling quick setup of an inference endpoint.

Models can be set up on-premises, too, but would require a higher initial investment to procure hardware. We recommend opting for a cloud-based approach for quickly deploying open LLMs without a significant upfront cost, especially considering the dynamic nature of this field and ongoing research to make these models accessible on consumer-grade hardware.

Examples of cloud-hosted open LLM models include Llama2, Falcon, Dolly, Flan-T5, Vicuna, etc.

This option is the most expensive, as setting up an inference endpoint for foundational LLM models with many parameters demands substantial GPU-based hardware and increased RAM. Meeting low-latency service level agreements and high availability requirements would necessitate additional infrastructure, making the solution costly. Therefore, it is crucial to evaluate the costs against the business value.

4. Self-Hosted and Self-Quantized Open LLMs

This option involves using various quantization techniques and libraries to quantize LLM models independently. It focuses on post-training quantization approaches where quantization techniques are applied to pre-trained foundational models to reduce their size, memory, and computing power requirements. During the quantization process, it’s important to note that these models don’t undergo any further training.

You can quantize a model into 8-bit, 5-bit, 4-bit, etc. precision models.

Here are a few techniques to quantize/compress the LLM base foundational models.


Here are a few libraries that can help quantize LLM-based foundational models.


Please note that Hugging Face’s Transformers library integrates Auto-GPTQ and Bitsandbytes. Also, understand that this option is complex and requires in-depth research skills to accomplish quantization with various quantization techniques available. It would be best if you used pre-quantized models unless it’s necessary and you have a deep understanding of different quantization techniques.

Implementing this approach involves a substantial initial investment in skilled engineers, resources, and time. If we experiment with promising outcomes to create well-established libraries, we can significantly reduce the infrastructure costs later.

 

Conclusion

In this post, we have explored diverse options for a progressive journey in LLM model-based experiments for organizations or projects. All options are arranged from simple to complex approaches and suggest advanced frameworks to make informed decisions for any use case. This post has focused explicitly on trying out-of-the-box models and strategies without fine-tuning.

 

Originally published on Medium - https://medium.com/@genai_cybage_software/integrating-gen-ai-in-your-product-ecosystem-c6de8d136b03

Comment (0)

Read Other Blogs

3 min read
Blog
Thumbnail_Gen_AI_Banner_480X272.webp
Generative AI
Large Language Models
Software Development
IT Services
By Aneesh Nathani
Posted On: 21 May, 2025
Integrating Gen AI into Your Product Ecosystem
An insightful roadmap for software companies and enterprises to explore Gen AI applications, navigate AI maturity…

164

Comment
(0)
7 min read
Blog
Supply-chain-automation
AI in Supply Chain
Supply Chain Automation
Predictive Analytics
Ecommerce
By Ravi Sharma
Posted On: 24 October, 2024
Supply Chain a Trillion Dollar Industry with AI Evolution
The supply chain industry, a billion-dollar behemoth, is on the verge of a significant transformation. As global…

323

Comment
(0)
6 min read
Blog
Software Development with Generative AI The Next Frontier in SDLC Evolution
Hi-tech
Generative AI
SDLC
Artificial Intelligence
By Raunaq Lalwani
Posted On: 12 September, 2024
Empowering Software Development with Generative AI: The Next...
Overview Gone are the days of clunky, siloed development processes. The future of technology is brimming with…

714

Comment
(0)
7 min read
Blog
Marketing with AI in 2024
AI
AI in marketing
digital marketing with AI
Digital Advertising
Machine Learning
By Susheel Kumar
Posted On: 17 June, 2024
Marketing in 2024 with AI: Tips, Tricks and Checklists
The marketing sector has undergone significant changes in recent years, greatly driven by technological disruptions…

609

Comment
(0)
6 min read
Blog
Adapting to changing Fintech Consulting landscape
Fintech
Payment Tech
Lending & Finance
Wealth & Crypto
Fintech Solutions
By Amit Patel
Posted On: 23 May, 2024
Adapting to the Future: FinTech's Influence on the Financial...
“The financial system is being rewired, and Fintech is the wire.” – Jim Marous, Fintech Author and Speaker…

682

Comment
(0)
5 min read
Blog
Fleet Management
FMS
Telematics
Fleet Management
Fleet Safety
Supply chain and Logistics
Supply chain Management
By Suresh Mamunuru
Posted On: 29 December, 2020
Fueling the Future of Fleet Management System
In the last few years, the global logistics landscape evolved at an unimaginable pace. And why wouldn't it? It had…

107

Comment
(0)
2 min read
Blog
Generic-Blog
AI
Emerging Technologies
Technology
Hi-tech
By Debasis Ray
Posted On: 27 July, 2020
Embrace Technology
Embracing technology has always been a key part of strategy. When Robert Iger took over as the CEO of the Walt…

121

Comment
(0)
3 min read
Blog
An Organisation is as Strong as the Collective Talent of its Workforce
Collective Talent
Talent
Workforce
Human Resource
Data Analytics
AI
ML
By Elston Pimenta
Posted On: 13 December, 2019
An Organisation is as Strong as the Collective Talent of its...
A good employer should always …” In the era of continuous technical disruptions, how you finish that sentence could…

90

Comment
(0)
4 min read
Blog
Digital Transformation Predictions 2019 - Jagat Pal Singh
AI
Digital Transformation
ML
Chatbots
Digital Transformation Trends
Digital Trends 2019
Cloud
IoT
Chatbot
Blockchain
By Jagat Pal Singh
Posted On: 14 February, 2019
Top Digital Transformation Trends For 2019
Year 2019 should see good adoption of predictive/prescriptive analytics across the industries. The most talked…

92

Comment
(0)
4 min read
Blog
Blog Image
Digital Transformation
Digital Trends 2019
AI
ML
IoT
By Jagat Pal Singh
Posted On: 16 December, 2018
The Role of AI, ML, and IoT in Digital Transformation in...
Artificial Intelligence and Machine Learning represent the mind of the artificial world, whereas the IoT represents…

103

Comment
(0)