Fast-Tracking Machine Learning: Leveraging pre-trained models for faster development
Fast-Tracking Machine Learning: Leveraging pre-trained models for faster development
Fast-Tracking Machine Learning: Leveraging pre-trained models for faster development
Oct 3, 2024
Oct 3, 2024
Federizo Zaiter
Federizo Zaiter
Co-Founder & AI / MLOps Engineer
ML has become more accessible than ever, thanks to rapid advancements in research and the availability of open-source tools and models. Businesses can now quickly develop AI-powered solutions by leveraging pre-trained models and task-driven approaches, drastically reducing development time and cost. This shift enables faster prototyping and easier deployment, making AI a viable option for companies of all sizes.
In this article, we'll explore how businesses can quickly develop AI solutions by leveraging pre-trained models and task-driven approaches. By the end, you'll discover practical strategies to deploy AI more efficiently, regardless of your data limitations or technical expertise.
A Task-Driven Approach
ML models usually come from training an algorithm with data, and while data can be a bottleneck in a data-centric approach, we'll focus on a task-driven approach to building ML systems. The question now becomes, what task am I trying to solve? Instead of asking what data do I need, or what data do I have, the task itself guides us. In fact, we may not even need training data at all. Whether it's classifying text, detecting objects in images, forecasting time series, or any other AI task, this task-driven perspective changes how we go about prototyping.
Starting Points: Pre-trained vs. Untrained Models
Let's consider two starting points for our model: a pre-trained model or an untrained model. Which one do you think is closest to being production-ready?
The pre-trained one.
To understand what "pre-trained" means, let's use an analogy with students:
An untrained model is like a student starting university to earn a major in a target task.
A pre-trained model has already gone through "school" and "graduated." We can now use it to tackle tasks similar or related to its training.
What kind of tasks can pre-trained models tackle?
Leveraging Pre-trained Models
The types of tasks pre-trained models can handle depend entirely on their training. For example: language models can classify text topics or sentiment, while computer vision models can detect or segment objects in pictures or videos
We can adjust pre-trained models to serve our own purposes through the following main approaches: fine-tuning, embeddings, prompting, and zero-shot.
Fine-tuning: Specializing pre-trained models on our own task
Continuing with the student analogy, fine-tuning is like a student who has completed general education and now pursues postgraduate studies to specialize in a specific field. Just as a postgraduate program allows the student to deepen their knowledge in a particular subject, fine-tuning allows a pre-trained AI model to specialize on a specific task. Instead of training a model from scratch, which would be like going through all levels of schooling again, fine-tuning takes an existing model that already understands general knowledge and refines it for a specific application. For example, a language model pre-trained on general text can be fine-tuned on legal documents to assist with contract analysis, or on medical records to help with diagnostic predictions. This process allows businesses or researchers to get more precise results without having to start from scratch.
Fine-tuning also saves a significant amount of time and computing resources. Training a machine learning model from scratch requires vast amounts of data and computational power, often inaccessible to many smaller organizations. By using a pre-trained model and focusing the additional training on a specific task, you can achieve high accuracy with far less data and in a shorter amount of time. A great example of fine-tuning in action is in chatbots. While models like GPT-4 have been trained on general text data, businesses fine-tune them using customer interactions from their specific industry. This customization results in chatbots that can handle customer queries more accurately, whether it's for customer support or translating natural language into queries of domain-specific languages such as SQL.
There are several platforms and frameworks that make fine-tuning pre-trained models accessible to a wide range of users. PyTorch and TensorFlow are two of the most popular open-source libraries that provide pre-trained models across various domains, from natural language processing to image recognition. Hugging Face, in particular, stands out in the community for offering a vast hub of open-source pre-trained models that can be easily fine-tuned for specific tasks, such as text classification, translation, or summarization. This ecosystem makes it simple for users to take advantage of state-of-the-art models without needing enormous datasets or computing resources, democratizing access to powerful AI.
In the realm of large language models (LLMs), fine-tuning can be especially important. While GPT-4 is one of the most well-known and powerful models, Meta's LLaMA (Large Language Model Meta AI) is an open-source alternative that has gained attention for its performance and versatility. LLaMA models can be fine-tuned for a wide range of language tasks, and their open-source nature makes them highly accessible for developers and researchers looking to customize a model for their own purposes. By fine-tuning these pre-trained models, users can adapt them to highly specific tasks, achieving impressive results without the need for the massive resources required for training models from scratch.
While fine-tuning improves performance on specific tasks, it can also present challenges. Models can become too specialized and may "overfit" the data, meaning they perform very well on the training data but struggle with new, unseen examples. Finding the right balance is essential. In practice, this method is widely used across industries—from natural language processing applications like automated email responses to image recognition tasks like identifying specific product defects on a production line.
Embeddings: Extracting Semantic Representations
Embeddings are like condensed summaries of data that capture its essential meaning or context in a format that machines can understand. These are vector representations that map words, sentences, or even images into a high-dimensional space where similar items are placed closer together. When extracted from pre-trained models, embeddings serve as powerful tools for building machine learning systems that understand the relationships and similarities between different data points. For example, an AI system trained to recommend books can use embeddings to compare books based on their themes and writing styles, even if the titles or genres are different. Pre-trained models from frameworks like Hugging Face, PyTorch, or TensorFlow offer rich, ready-made embeddings that can be used directly or further fine-tuned.
One common approach that leverages embeddings is K-nearest neighbors (KNN), a simple yet effective method for classification or recommendation tasks. Once data points are represented as embeddings, KNN can be used to find the closest match or similar items based on their position in the embedding space. For instance, in a text search application, a query can be converted into an embedding, and KNN can then find the most relevant documents by looking for the closest embeddings. This approach is widely used in search engines and recommendation systems, where finding similar items efficiently is key to the user experience.
Embeddings also play a crucial role in Retrieval-Augmented Generation (RAG), a technique that combines large language models (LLMs) with external knowledge sources. In RAG, embeddings are used to retrieve relevant information from a database or document set, which is then fed into an LLM to generate more accurate and contextually aware responses. For example, when a user asks a question, embeddings can help retrieve relevant documents, which are then used by models like GPT-4 or Meta's LLaMA to generate a coherent and informative answer. This method improves the quality of responses, especially for tasks that require detailed knowledge retrieval from large, specialized datasets.
Prompting: Instructing pre-trained models on our target task
Prompting is like giving clear instructions to a well-trained assistant. Instead of retraining a general-purpose AI, you provide a prompt—a simple instruction that guides it to solve a specific task. For example, when using GPT-4 to summarize an article, a prompt like, "Summarize this article in three sentences," enables the model to leverage its knowledge without altering its underlying structure. This approach unlocks the model's ability to handle tasks such as translation, text generation, and answering questions, all through well-crafted prompts.
Another powerful method is in-context learning, where examples of the task are provided directly within the prompt itself. By including input-output pairs, the model can adapt to the desired task dynamically. For instance, providing a few translation examples in the prompt helps the model "learn" how to translate a new sentence without needing additional training. This makes prompting incredibly flexible, allowing pre-trained models to be applied to a variety of tasks quickly.
Moreover, retrieval-augmented generation (RAG) enhances prompting by combining pre-trained models with external data. The model retrieves relevant information from external sources, using it to generate more accurate or contextually relevant responses. This technique is particularly useful when the model's training data is insufficient or outdated, allowing it to stay relevant without retraining.
In addition to text, visual prompting plays a crucial role in image-based models. For example, Meta's "Segment Anything Model" (SAM) lets users guide the model by clicking on or drawing around objects within images, allowing it to instantly segment or identify those parts. This interaction mirrors how text-based prompts work but adapts it to visual tasks, from medical imaging to self-driving car systems.
LLMs in particular have also enabled prompting to evolve into agentic AI systems autonomously solving complex problems with limited to no supervision. However this topic goes beyond the scope of the article and the kind of problems we are focusing on.
Zero-shot: Directly using pre-trained models without additional training
Zero-shot learning is like asking someone with broad knowledge to perform a task they've never done before—they rely on their understanding of related topics to give it a try. In AI, zero-shot learning allows pre-trained models to handle tasks they weren't explicitly trained for. For example, some AI models are trained using a technique called Natural Language Inference (NLI), which teaches them to determine relationships between two pieces of text—whether one sentence logically follows from another or contradicts it. This foundational skill can be leveraged for zero-shot learning, where the model uses its understanding of text logic to complete entirely new tasks, like sorting customer reviews into positive and negative categories, even if it hasn't seen those specific reviews before.
Imagine you want to build a system that can classify social media posts by emotion—happy, sad, or angry—but you don't have labeled data for it. A pre-trained model that learned how to infer relationships between sentences (like determining if "I'm thrilled!" implies positive sentiment) could apply this understanding to your task, even though it hasn't been trained directly on emotion categories. This is the essence of zero-shot learning: the model applies its broad, general knowledge to solve new problems on the fly.
While zero-shot learning is powerful and highly flexible, it may not always deliver perfect results. Since the model relies on general understanding rather than specific training, it might miss nuances in highly specialized tasks. Still, for many everyday uses—like filtering emails, analyzing customer feedback, or even generating summaries of news articles—zero-shot learning provides a fast, efficient way to leverage AI without the need for additional training or data.
When Pre-trained Models Don't Fit
Although pre-trained models are often effective, they may not always fit the specific requirements of a task. In such cases, we might need to start from scratch with an untrained model, following a data-centric approach. However, this doesn't mean we can't move quickly—task-driven methods can still help accelerate the process. It's important to note that a data-centric approach can be efficient when supported by certain techniques.
When faced with this challenge, we have two main options: either train models using a strong baseline or leverage AutoML to streamline and optimize the training process.
Training models from a good baseline: Start with a well-designed architecture that has proven effective for similar tasks. Fortunately, there are various open-source libraries that include implementations of battle-tested algorithms. These include scikit-learn for classical machine learning algorithms, and pytorch or tensorflow for deep learning ones, among others. However, this still requires you to know about all the different algorithms that are out there to find the one that may suit your task. This is where AutoML comes in handy.
Using AutoML to optimize model training: You can leverage AutoML to quickly find the best model architecture and hyperparameters for your specific task. This is the other way round, configure the AutoML tool for your task, and it will try the appropriate algorithms, optimize them, and even combine them to get you the best results for your data. AutoML allows us to focus on the task at hand while efficiently exploring the model space.
While all cloud providers have their own AutoML solution, a great open-source tool you should try is AutoGluon.
The Data-Centric Approach
While our focus in this article has been on the task-driven approach, it's worth noting that the data-centric perspective remains crucial in most ML projects. When working with limited or specialized datasets, several techniques and practices can accelerate the data-centric approach. It's also important to consider the MLOps requirements that come with developing and deploying machine learning models, as they ensure reproducibility, scalability, and smooth integration into production environments. We'll share more insights on these data-centric techniques, along with MLOps considerations, in future blog posts.
Adopting a Task-Driven Approach
By adopting task-driven methods and leveraging pre-trained models, businesses can accelerate the development of AI solutions while minimizing costs and data requirements. This approach opens up opportunities for innovation, customization, and scalability, empowering companies to stay competitive and embrace AI with confidence, regardless of their industry or expertise level.
Ready to take the next step? Start exploring pre-trained models and task-driven AI to transform your business today. Feel free to contact us for expert guidance on implementing the right solutions tailored to your needs.
ML has become more accessible than ever, thanks to rapid advancements in research and the availability of open-source tools and models. Businesses can now quickly develop AI-powered solutions by leveraging pre-trained models and task-driven approaches, drastically reducing development time and cost. This shift enables faster prototyping and easier deployment, making AI a viable option for companies of all sizes.
In this article, we'll explore how businesses can quickly develop AI solutions by leveraging pre-trained models and task-driven approaches. By the end, you'll discover practical strategies to deploy AI more efficiently, regardless of your data limitations or technical expertise.
A Task-Driven Approach
ML models usually come from training an algorithm with data, and while data can be a bottleneck in a data-centric approach, we'll focus on a task-driven approach to building ML systems. The question now becomes, what task am I trying to solve? Instead of asking what data do I need, or what data do I have, the task itself guides us. In fact, we may not even need training data at all. Whether it's classifying text, detecting objects in images, forecasting time series, or any other AI task, this task-driven perspective changes how we go about prototyping.
Starting Points: Pre-trained vs. Untrained Models
Let's consider two starting points for our model: a pre-trained model or an untrained model. Which one do you think is closest to being production-ready?
The pre-trained one.
To understand what "pre-trained" means, let's use an analogy with students:
An untrained model is like a student starting university to earn a major in a target task.
A pre-trained model has already gone through "school" and "graduated." We can now use it to tackle tasks similar or related to its training.
What kind of tasks can pre-trained models tackle?
Leveraging Pre-trained Models
The types of tasks pre-trained models can handle depend entirely on their training. For example: language models can classify text topics or sentiment, while computer vision models can detect or segment objects in pictures or videos
We can adjust pre-trained models to serve our own purposes through the following main approaches: fine-tuning, embeddings, prompting, and zero-shot.
Fine-tuning: Specializing pre-trained models on our own task
Continuing with the student analogy, fine-tuning is like a student who has completed general education and now pursues postgraduate studies to specialize in a specific field. Just as a postgraduate program allows the student to deepen their knowledge in a particular subject, fine-tuning allows a pre-trained AI model to specialize on a specific task. Instead of training a model from scratch, which would be like going through all levels of schooling again, fine-tuning takes an existing model that already understands general knowledge and refines it for a specific application. For example, a language model pre-trained on general text can be fine-tuned on legal documents to assist with contract analysis, or on medical records to help with diagnostic predictions. This process allows businesses or researchers to get more precise results without having to start from scratch.
Fine-tuning also saves a significant amount of time and computing resources. Training a machine learning model from scratch requires vast amounts of data and computational power, often inaccessible to many smaller organizations. By using a pre-trained model and focusing the additional training on a specific task, you can achieve high accuracy with far less data and in a shorter amount of time. A great example of fine-tuning in action is in chatbots. While models like GPT-4 have been trained on general text data, businesses fine-tune them using customer interactions from their specific industry. This customization results in chatbots that can handle customer queries more accurately, whether it's for customer support or translating natural language into queries of domain-specific languages such as SQL.
There are several platforms and frameworks that make fine-tuning pre-trained models accessible to a wide range of users. PyTorch and TensorFlow are two of the most popular open-source libraries that provide pre-trained models across various domains, from natural language processing to image recognition. Hugging Face, in particular, stands out in the community for offering a vast hub of open-source pre-trained models that can be easily fine-tuned for specific tasks, such as text classification, translation, or summarization. This ecosystem makes it simple for users to take advantage of state-of-the-art models without needing enormous datasets or computing resources, democratizing access to powerful AI.
In the realm of large language models (LLMs), fine-tuning can be especially important. While GPT-4 is one of the most well-known and powerful models, Meta's LLaMA (Large Language Model Meta AI) is an open-source alternative that has gained attention for its performance and versatility. LLaMA models can be fine-tuned for a wide range of language tasks, and their open-source nature makes them highly accessible for developers and researchers looking to customize a model for their own purposes. By fine-tuning these pre-trained models, users can adapt them to highly specific tasks, achieving impressive results without the need for the massive resources required for training models from scratch.
While fine-tuning improves performance on specific tasks, it can also present challenges. Models can become too specialized and may "overfit" the data, meaning they perform very well on the training data but struggle with new, unseen examples. Finding the right balance is essential. In practice, this method is widely used across industries—from natural language processing applications like automated email responses to image recognition tasks like identifying specific product defects on a production line.
Embeddings: Extracting Semantic Representations
Embeddings are like condensed summaries of data that capture its essential meaning or context in a format that machines can understand. These are vector representations that map words, sentences, or even images into a high-dimensional space where similar items are placed closer together. When extracted from pre-trained models, embeddings serve as powerful tools for building machine learning systems that understand the relationships and similarities between different data points. For example, an AI system trained to recommend books can use embeddings to compare books based on their themes and writing styles, even if the titles or genres are different. Pre-trained models from frameworks like Hugging Face, PyTorch, or TensorFlow offer rich, ready-made embeddings that can be used directly or further fine-tuned.
One common approach that leverages embeddings is K-nearest neighbors (KNN), a simple yet effective method for classification or recommendation tasks. Once data points are represented as embeddings, KNN can be used to find the closest match or similar items based on their position in the embedding space. For instance, in a text search application, a query can be converted into an embedding, and KNN can then find the most relevant documents by looking for the closest embeddings. This approach is widely used in search engines and recommendation systems, where finding similar items efficiently is key to the user experience.
Embeddings also play a crucial role in Retrieval-Augmented Generation (RAG), a technique that combines large language models (LLMs) with external knowledge sources. In RAG, embeddings are used to retrieve relevant information from a database or document set, which is then fed into an LLM to generate more accurate and contextually aware responses. For example, when a user asks a question, embeddings can help retrieve relevant documents, which are then used by models like GPT-4 or Meta's LLaMA to generate a coherent and informative answer. This method improves the quality of responses, especially for tasks that require detailed knowledge retrieval from large, specialized datasets.
Prompting: Instructing pre-trained models on our target task
Prompting is like giving clear instructions to a well-trained assistant. Instead of retraining a general-purpose AI, you provide a prompt—a simple instruction that guides it to solve a specific task. For example, when using GPT-4 to summarize an article, a prompt like, "Summarize this article in three sentences," enables the model to leverage its knowledge without altering its underlying structure. This approach unlocks the model's ability to handle tasks such as translation, text generation, and answering questions, all through well-crafted prompts.
Another powerful method is in-context learning, where examples of the task are provided directly within the prompt itself. By including input-output pairs, the model can adapt to the desired task dynamically. For instance, providing a few translation examples in the prompt helps the model "learn" how to translate a new sentence without needing additional training. This makes prompting incredibly flexible, allowing pre-trained models to be applied to a variety of tasks quickly.
Moreover, retrieval-augmented generation (RAG) enhances prompting by combining pre-trained models with external data. The model retrieves relevant information from external sources, using it to generate more accurate or contextually relevant responses. This technique is particularly useful when the model's training data is insufficient or outdated, allowing it to stay relevant without retraining.
In addition to text, visual prompting plays a crucial role in image-based models. For example, Meta's "Segment Anything Model" (SAM) lets users guide the model by clicking on or drawing around objects within images, allowing it to instantly segment or identify those parts. This interaction mirrors how text-based prompts work but adapts it to visual tasks, from medical imaging to self-driving car systems.
LLMs in particular have also enabled prompting to evolve into agentic AI systems autonomously solving complex problems with limited to no supervision. However this topic goes beyond the scope of the article and the kind of problems we are focusing on.
Zero-shot: Directly using pre-trained models without additional training
Zero-shot learning is like asking someone with broad knowledge to perform a task they've never done before—they rely on their understanding of related topics to give it a try. In AI, zero-shot learning allows pre-trained models to handle tasks they weren't explicitly trained for. For example, some AI models are trained using a technique called Natural Language Inference (NLI), which teaches them to determine relationships between two pieces of text—whether one sentence logically follows from another or contradicts it. This foundational skill can be leveraged for zero-shot learning, where the model uses its understanding of text logic to complete entirely new tasks, like sorting customer reviews into positive and negative categories, even if it hasn't seen those specific reviews before.
Imagine you want to build a system that can classify social media posts by emotion—happy, sad, or angry—but you don't have labeled data for it. A pre-trained model that learned how to infer relationships between sentences (like determining if "I'm thrilled!" implies positive sentiment) could apply this understanding to your task, even though it hasn't been trained directly on emotion categories. This is the essence of zero-shot learning: the model applies its broad, general knowledge to solve new problems on the fly.
While zero-shot learning is powerful and highly flexible, it may not always deliver perfect results. Since the model relies on general understanding rather than specific training, it might miss nuances in highly specialized tasks. Still, for many everyday uses—like filtering emails, analyzing customer feedback, or even generating summaries of news articles—zero-shot learning provides a fast, efficient way to leverage AI without the need for additional training or data.
When Pre-trained Models Don't Fit
Although pre-trained models are often effective, they may not always fit the specific requirements of a task. In such cases, we might need to start from scratch with an untrained model, following a data-centric approach. However, this doesn't mean we can't move quickly—task-driven methods can still help accelerate the process. It's important to note that a data-centric approach can be efficient when supported by certain techniques.
When faced with this challenge, we have two main options: either train models using a strong baseline or leverage AutoML to streamline and optimize the training process.
Training models from a good baseline: Start with a well-designed architecture that has proven effective for similar tasks. Fortunately, there are various open-source libraries that include implementations of battle-tested algorithms. These include scikit-learn for classical machine learning algorithms, and pytorch or tensorflow for deep learning ones, among others. However, this still requires you to know about all the different algorithms that are out there to find the one that may suit your task. This is where AutoML comes in handy.
Using AutoML to optimize model training: You can leverage AutoML to quickly find the best model architecture and hyperparameters for your specific task. This is the other way round, configure the AutoML tool for your task, and it will try the appropriate algorithms, optimize them, and even combine them to get you the best results for your data. AutoML allows us to focus on the task at hand while efficiently exploring the model space.
While all cloud providers have their own AutoML solution, a great open-source tool you should try is AutoGluon.
The Data-Centric Approach
While our focus in this article has been on the task-driven approach, it's worth noting that the data-centric perspective remains crucial in most ML projects. When working with limited or specialized datasets, several techniques and practices can accelerate the data-centric approach. It's also important to consider the MLOps requirements that come with developing and deploying machine learning models, as they ensure reproducibility, scalability, and smooth integration into production environments. We'll share more insights on these data-centric techniques, along with MLOps considerations, in future blog posts.
Adopting a Task-Driven Approach
By adopting task-driven methods and leveraging pre-trained models, businesses can accelerate the development of AI solutions while minimizing costs and data requirements. This approach opens up opportunities for innovation, customization, and scalability, empowering companies to stay competitive and embrace AI with confidence, regardless of their industry or expertise level.
Ready to take the next step? Start exploring pre-trained models and task-driven AI to transform your business today. Feel free to contact us for expert guidance on implementing the right solutions tailored to your needs.