Fast-Tracking Machine Learning: Leveraging pre-trained models for faster development

Fast-Tracking Machine Learning: Leveraging pre-trained models for faster development

Fast-Tracking Machine Learning: Leveraging pre-trained models for faster development

Oct 3, 2024

Oct 3, 2024

Federizo Zaiter

Federizo Zaiter

Co-Founder & AI / MLOps Engineer

ML has become more accessible than ever, thanks to rapid advancements in research and the availability of open-source tools and models. Businesses can now quickly develop AI-powered solutions by leveraging pre-trained models and task-driven approaches, drastically reducing development time and cost. This shift enables faster prototyping and easier deployment, making AI a viable option for companies of all sizes.


In this article, we'll explore how businesses can quickly develop AI solutions by leveraging pre-trained models and task-driven approaches. By the end, you'll discover practical strategies to deploy AI more efficiently, regardless of your data limitations or technical expertise.


A Task-Driven Approach


ML models usually come from training an algorithm with data, and while data can be a bottleneck in a data-centric approach, we'll focus on a task-driven approach to building ML systems. The question now becomes, what task am I trying to solve? Instead of asking what data do I need, or what data do I have, the task itself guides us. In fact, we may not even need training data at all. Whether it's classifying text, detecting objects in images, forecasting time series, or any other AI task, this task-driven perspective changes how we go about prototyping.


Starting Points: Pre-trained vs. Untrained Models


Let's consider two starting points for our model: a pre-trained model or an untrained model. Which one do you think is closest to being production-ready? The pre-trained one.

To understand what "pre-trained" means, let's use an analogy with students:

  • An untrained model is like a student starting university to earn a major in a target task.

  • A pre-trained model has already gone through "school" and "graduated." We can now use it to tackle tasks similar or related to its training.


What kind of tasks can pre-trained models tackle?


Leveraging Pre-trained Models


The types of tasks pre-trained models can handle depend entirely on their training. For example: language models can classify text topics or sentiment, while computer vision models can detect or segment objects in pictures or videos


We can adjust pre-trained models to serve our own purposes through three main approaches: fine-tuning, prompting, and zero-shot.


Fine-tuning: Specializing pre-trained models on our own task


Continuing with the student analogy, fine-tuning is like a student who has completed general education and now pursues postgraduate studies to specialize in a specific field. Just as a postgraduate program allows the student to deepen their knowledge in a particular subject, fine-tuning allows a pre-trained AI model to specialize on a specific task. Instead of training a model from scratch, which would be like going through all levels of schooling again, fine-tuning takes an existing model that already understands general knowledge and refines it for a specific application. For example, a language model pre-trained on general text can be fine-tuned on legal documents to assist with contract analysis, or on medical records to help with diagnostic predictions. This process allows businesses or researchers to get more precise results without having to start from scratch.


Fine-tuning also saves a significant amount of time and computing resources. Training a machine learning model from scratch requires vast amounts of data and computational power, often inaccessible to many smaller organizations. By using a pre-trained model and focusing the additional training on a specific task, you can achieve high accuracy with far less data and in a shorter amount of time. A great example of fine-tuning in action is in chatbots. While models like GPT-4 have been trained on general text data, businesses fine-tune them using customer interactions from their specific industry. This customization results in chatbots that can handle customer queries more accurately, whether it's for customer support or translating natural language into queries of domain-specific languages such as SQL.


There are several platforms and frameworks that make fine-tuning pre-trained models accessible to a wide range of users. PyTorch and TensorFlow are two of the most popular open-source libraries that provide pre-trained models across various domains, from natural language processing to image recognition. Hugging Face, in particular, stands out in the community for offering a vast hub of open-source pre-trained models that can be easily fine-tuned for specific tasks, such as text classification, translation, or summarization. This ecosystem makes it simple for users to take advantage of state-of-the-art models without needing enormous datasets or computing resources, democratizing access to powerful AI.


In the realm of large language models (LLMs), fine-tuning can be especially important. While GPT-4 is one of the most well-known and powerful models, Meta's LLaMA (Large Language Model Meta AI) is an open-source alternative that has gained attention for its performance and versatility. LLaMA models can be fine-tuned for a wide range of language tasks, and their open-source nature makes them highly accessible for developers and researchers looking to customize a model for their own purposes. By fine-tuning these pre-trained models, users can adapt them to highly specific tasks, achieving impressive results without the need for the massive resources required for training models from scratch.


While fine-tuning improves performance on specific tasks, it can also present challenges. Models can become too specialized and may "overfit" the data, meaning they perform very well on the training data but struggle with new, unseen examples. Finding the right balance is essential. In practice, this method is widely used across industries—from natural language processing applications like automated email responses to image recognition tasks like identifying specific product defects on a production line.


Prompting: Instructing pre-trained models on our target task


Prompting is like giving clear instructions to a well-trained assistant. Imagine having a general-purpose AI that knows a little bit about everything, but you need it to perform a specific task. Instead of retraining the AI, you can simply provide a prompt—an instruction or question that guides it toward solving the problem at hand. In essence, prompting involves telling a pre-trained model what you need it to do without changing the model itself. For example, when using an LLM like GPT-4 to summarize an article, you might provide a prompt such as, "Summarize this article in three sentences." The model then uses its knowledge to generate a response. In this way, prompting can unlock the model's potential to handle various tasks like translation, text generation, or answering questions—all with just the right input.


Beyond text, visual prompting has also become a key development, particularly with models designed for images. Visual prompts work similarly, guiding models in image-related tasks. For instance, Meta's "Segment Anything Model" (SAM) allows users to interact with the model through visual inputs. In this case, a user might click on an object within an image or draw a box around it, and the model instantly understands the task to segment or identify that specific object. This type of interaction empowers users to get AI to focus on specific parts of an image without needing any additional training. Just like in text-based prompting, visual prompts allow for adaptability in different image analysis scenarios—from medical imaging to object recognition in self-driving cars.


Prompting, whether in text or visuals, is incredibly flexible. The same pre-trained model can be applied to a wide variety of tasks just by changing the prompt. For example, a model might be prompted to generate a creative story one moment and answer factual questions the next. Similarly, with visual models, the prompt could switch from identifying animals in a photo to isolating regions in medical scans. There are image generation models such as stable diffusion too. However, crafting effective prompts can require some experimentation. The quality of the instruction—whether it's a word or an image cue—directly impacts how well the model performs. When used effectively, prompting opens the door to quickly applying AI, allowing models to be deployed quickly and efficiently across diverse applications.


LLMs in particular have also enabled prompting to evolve into agentic AI systems autonomously solving complex problems with limited to no supervision. However this topic, goes beyond the scope of the article and the kind of problems we are focusing on.


Zero-shot: Directly using pre-trained models without additional training


Zero-shot learning is like asking someone with broad knowledge to perform a task they've never done before—they rely on their understanding of related topics to give it a try. In AI, zero-shot learning allows pre-trained models to handle tasks they weren't explicitly trained for. For example, some AI models are trained using a technique called Natural Language Inference (NLI), which teaches them to determine relationships between two pieces of text—whether one sentence logically follows from another or contradicts it. This foundational skill can be leveraged for zero-shot learning, where the model uses its understanding of text logic to complete entirely new tasks, like sorting customer reviews into positive and negative categories, even if it hasn't seen those specific reviews before.


Imagine you want to build a system that can classify social media posts by emotion—happy, sad, or angry—but you don't have labeled data for it. A pre-trained model that learned how to infer relationships between sentences (like determining if "I'm thrilled!" implies positive sentiment) could apply this understanding to your task, even though it hasn't been trained directly on emotion categories. This is the essence of zero-shot learning: the model applies its broad, general knowledge to solve new problems on the fly.


While zero-shot learning is powerful and highly flexible, it may not always deliver perfect results. Since the model relies on general understanding rather than specific training, it might miss nuances in highly specialized tasks. Still, for many everyday uses—like filtering emails, analyzing customer feedback, or even generating summaries of news articles—zero-shot learning provides a fast, efficient way to leverage AI without the need for additional training or data.


When Pre-trained Models Don't Fit


Although pre-trained models are often effective, they may not always fit the specific requirements of a task. In such cases, we might need to start from scratch with an untrained model, following a data-centric approach. However, this doesn't mean we can't move quickly—task-driven methods can still help accelerate the process. It's important to note that a data-centric approach can be efficient when supported by certain techniques.


When faced with this challenge, we have two main options: either train models using a strong baseline or leverage AutoML to streamline and optimize the training process.


Training models from a good baseline: Start with a well-designed architecture that has proven effective for similar tasks. Fortunately, there are various open-source libraries that include implementations of battle-tested algorithms. These include scikit-learn for classical machine learning algorithms, and pytorch or tensorflow for deep learning ones, among others. However, this still requires you to know about all the different algorithms that are out there to find the one that may suit your task. This is where AutoML comes in handy.


Using AutoML to optimize model training: You can leverage AutoML to quickly find the best model architecture and hyperparameters for your specific task. This is the other way round, configure the AutoML tool for your task, and it will try the appropriate algorithms, optimize them, and even combine them to get you the best results for your data. AutoML allows us to focus on the task at hand while efficiently exploring the model space.


While all cloud providers have their own AutoML solution, a great open-source tool you should try is AutoGluon.


The Data-Centric Approach


While our focus in this article has been on the task-driven approach, it's worth noting that the data-centric perspective remains crucial in most ML projects. When working with limited or specialized datasets, several techniques and practices can accelerate the data-centric approach. It's also important to consider the MLOps requirements that come with developing and deploying machine learning models, as they ensure reproducibility, scalability, and smooth integration into production environments. We'll share more insights on these data-centric techniques, along with MLOps considerations, in future blog posts.


Adopting a Task-Driven Approach


By adopting task-driven methods and leveraging pre-trained models, businesses can accelerate the development of AI solutions while minimizing costs and data requirements. This approach opens up opportunities for innovation, customization, and scalability, empowering companies to stay competitive and embrace AI with confidence, regardless of their industry or expertise level.


Ready to take the next step? Start exploring pre-trained models and task-driven AI to transform your business today. Feel free to contact us for expert guidance on implementing the right solutions tailored to your needs.

ML has become more accessible than ever, thanks to rapid advancements in research and the availability of open-source tools and models. Businesses can now quickly develop AI-powered solutions by leveraging pre-trained models and task-driven approaches, drastically reducing development time and cost. This shift enables faster prototyping and easier deployment, making AI a viable option for companies of all sizes.


In this article, we'll explore how businesses can quickly develop AI solutions by leveraging pre-trained models and task-driven approaches. By the end, you'll discover practical strategies to deploy AI more efficiently, regardless of your data limitations or technical expertise.


A Task-Driven Approach


ML models usually come from training an algorithm with data, and while data can be a bottleneck in a data-centric approach, we'll focus on a task-driven approach to building ML systems. The question now becomes, what task am I trying to solve? Instead of asking what data do I need, or what data do I have, the task itself guides us. In fact, we may not even need training data at all. Whether it's classifying text, detecting objects in images, forecasting time series, or any other AI task, this task-driven perspective changes how we go about prototyping.


Starting Points: Pre-trained vs. Untrained Models


Let's consider two starting points for our model: a pre-trained model or an untrained model. Which one do you think is closest to being production-ready? The pre-trained one.

To understand what "pre-trained" means, let's use an analogy with students:

  • An untrained model is like a student starting university to earn a major in a target task.

  • A pre-trained model has already gone through "school" and "graduated." We can now use it to tackle tasks similar or related to its training.


What kind of tasks can pre-trained models tackle?


Leveraging Pre-trained Models


The types of tasks pre-trained models can handle depend entirely on their training. For example: language models can classify text topics or sentiment, while computer vision models can detect or segment objects in pictures or videos


We can adjust pre-trained models to serve our own purposes through three main approaches: fine-tuning, prompting, and zero-shot.


Fine-tuning: Specializing pre-trained models on our own task


Continuing with the student analogy, fine-tuning is like a student who has completed general education and now pursues postgraduate studies to specialize in a specific field. Just as a postgraduate program allows the student to deepen their knowledge in a particular subject, fine-tuning allows a pre-trained AI model to specialize on a specific task. Instead of training a model from scratch, which would be like going through all levels of schooling again, fine-tuning takes an existing model that already understands general knowledge and refines it for a specific application. For example, a language model pre-trained on general text can be fine-tuned on legal documents to assist with contract analysis, or on medical records to help with diagnostic predictions. This process allows businesses or researchers to get more precise results without having to start from scratch.


Fine-tuning also saves a significant amount of time and computing resources. Training a machine learning model from scratch requires vast amounts of data and computational power, often inaccessible to many smaller organizations. By using a pre-trained model and focusing the additional training on a specific task, you can achieve high accuracy with far less data and in a shorter amount of time. A great example of fine-tuning in action is in chatbots. While models like GPT-4 have been trained on general text data, businesses fine-tune them using customer interactions from their specific industry. This customization results in chatbots that can handle customer queries more accurately, whether it's for customer support or translating natural language into queries of domain-specific languages such as SQL.


There are several platforms and frameworks that make fine-tuning pre-trained models accessible to a wide range of users. PyTorch and TensorFlow are two of the most popular open-source libraries that provide pre-trained models across various domains, from natural language processing to image recognition. Hugging Face, in particular, stands out in the community for offering a vast hub of open-source pre-trained models that can be easily fine-tuned for specific tasks, such as text classification, translation, or summarization. This ecosystem makes it simple for users to take advantage of state-of-the-art models without needing enormous datasets or computing resources, democratizing access to powerful AI.


In the realm of large language models (LLMs), fine-tuning can be especially important. While GPT-4 is one of the most well-known and powerful models, Meta's LLaMA (Large Language Model Meta AI) is an open-source alternative that has gained attention for its performance and versatility. LLaMA models can be fine-tuned for a wide range of language tasks, and their open-source nature makes them highly accessible for developers and researchers looking to customize a model for their own purposes. By fine-tuning these pre-trained models, users can adapt them to highly specific tasks, achieving impressive results without the need for the massive resources required for training models from scratch.


While fine-tuning improves performance on specific tasks, it can also present challenges. Models can become too specialized and may "overfit" the data, meaning they perform very well on the training data but struggle with new, unseen examples. Finding the right balance is essential. In practice, this method is widely used across industries—from natural language processing applications like automated email responses to image recognition tasks like identifying specific product defects on a production line.


Prompting: Instructing pre-trained models on our target task


Prompting is like giving clear instructions to a well-trained assistant. Imagine having a general-purpose AI that knows a little bit about everything, but you need it to perform a specific task. Instead of retraining the AI, you can simply provide a prompt—an instruction or question that guides it toward solving the problem at hand. In essence, prompting involves telling a pre-trained model what you need it to do without changing the model itself. For example, when using an LLM like GPT-4 to summarize an article, you might provide a prompt such as, "Summarize this article in three sentences." The model then uses its knowledge to generate a response. In this way, prompting can unlock the model's potential to handle various tasks like translation, text generation, or answering questions—all with just the right input.


Beyond text, visual prompting has also become a key development, particularly with models designed for images. Visual prompts work similarly, guiding models in image-related tasks. For instance, Meta's "Segment Anything Model" (SAM) allows users to interact with the model through visual inputs. In this case, a user might click on an object within an image or draw a box around it, and the model instantly understands the task to segment or identify that specific object. This type of interaction empowers users to get AI to focus on specific parts of an image without needing any additional training. Just like in text-based prompting, visual prompts allow for adaptability in different image analysis scenarios—from medical imaging to object recognition in self-driving cars.


Prompting, whether in text or visuals, is incredibly flexible. The same pre-trained model can be applied to a wide variety of tasks just by changing the prompt. For example, a model might be prompted to generate a creative story one moment and answer factual questions the next. Similarly, with visual models, the prompt could switch from identifying animals in a photo to isolating regions in medical scans. There are image generation models such as stable diffusion too. However, crafting effective prompts can require some experimentation. The quality of the instruction—whether it's a word or an image cue—directly impacts how well the model performs. When used effectively, prompting opens the door to quickly applying AI, allowing models to be deployed quickly and efficiently across diverse applications.


LLMs in particular have also enabled prompting to evolve into agentic AI systems autonomously solving complex problems with limited to no supervision. However this topic, goes beyond the scope of the article and the kind of problems we are focusing on.


Zero-shot: Directly using pre-trained models without additional training


Zero-shot learning is like asking someone with broad knowledge to perform a task they've never done before—they rely on their understanding of related topics to give it a try. In AI, zero-shot learning allows pre-trained models to handle tasks they weren't explicitly trained for. For example, some AI models are trained using a technique called Natural Language Inference (NLI), which teaches them to determine relationships between two pieces of text—whether one sentence logically follows from another or contradicts it. This foundational skill can be leveraged for zero-shot learning, where the model uses its understanding of text logic to complete entirely new tasks, like sorting customer reviews into positive and negative categories, even if it hasn't seen those specific reviews before.


Imagine you want to build a system that can classify social media posts by emotion—happy, sad, or angry—but you don't have labeled data for it. A pre-trained model that learned how to infer relationships between sentences (like determining if "I'm thrilled!" implies positive sentiment) could apply this understanding to your task, even though it hasn't been trained directly on emotion categories. This is the essence of zero-shot learning: the model applies its broad, general knowledge to solve new problems on the fly.


While zero-shot learning is powerful and highly flexible, it may not always deliver perfect results. Since the model relies on general understanding rather than specific training, it might miss nuances in highly specialized tasks. Still, for many everyday uses—like filtering emails, analyzing customer feedback, or even generating summaries of news articles—zero-shot learning provides a fast, efficient way to leverage AI without the need for additional training or data.


When Pre-trained Models Don't Fit


Although pre-trained models are often effective, they may not always fit the specific requirements of a task. In such cases, we might need to start from scratch with an untrained model, following a data-centric approach. However, this doesn't mean we can't move quickly—task-driven methods can still help accelerate the process. It's important to note that a data-centric approach can be efficient when supported by certain techniques.


When faced with this challenge, we have two main options: either train models using a strong baseline or leverage AutoML to streamline and optimize the training process.


Training models from a good baseline: Start with a well-designed architecture that has proven effective for similar tasks. Fortunately, there are various open-source libraries that include implementations of battle-tested algorithms. These include scikit-learn for classical machine learning algorithms, and pytorch or tensorflow for deep learning ones, among others. However, this still requires you to know about all the different algorithms that are out there to find the one that may suit your task. This is where AutoML comes in handy.


Using AutoML to optimize model training: You can leverage AutoML to quickly find the best model architecture and hyperparameters for your specific task. This is the other way round, configure the AutoML tool for your task, and it will try the appropriate algorithms, optimize them, and even combine them to get you the best results for your data. AutoML allows us to focus on the task at hand while efficiently exploring the model space.


While all cloud providers have their own AutoML solution, a great open-source tool you should try is AutoGluon.


The Data-Centric Approach


While our focus in this article has been on the task-driven approach, it's worth noting that the data-centric perspective remains crucial in most ML projects. When working with limited or specialized datasets, several techniques and practices can accelerate the data-centric approach. It's also important to consider the MLOps requirements that come with developing and deploying machine learning models, as they ensure reproducibility, scalability, and smooth integration into production environments. We'll share more insights on these data-centric techniques, along with MLOps considerations, in future blog posts.


Adopting a Task-Driven Approach


By adopting task-driven methods and leveraging pre-trained models, businesses can accelerate the development of AI solutions while minimizing costs and data requirements. This approach opens up opportunities for innovation, customization, and scalability, empowering companies to stay competitive and embrace AI with confidence, regardless of their industry or expertise level.


Ready to take the next step? Start exploring pre-trained models and task-driven AI to transform your business today. Feel free to contact us for expert guidance on implementing the right solutions tailored to your needs.

Ready to explore the possibilities?

Reach out to Sagitta, and let's collaborate on your next digital endeavor. Together, we'll create solutions that fulfill and exceed your expectations.

Ready to explore the possibilities?

Reach out to Sagitta, and let's collaborate on your next digital endeavor. Together, we'll create solutions that fulfill and exceed your expectations.

© 2024 Sagitta. All rights reserved.

Timezone (PT -  CET)

Follow us on:

© 2024 Sagitta. All rights reserved.

Timezone (PT -  CET)

Follow us on:

© 2024 Sagitta. All rights reserved.

Timezone (PT -  CET)

Follow us on: