Deep Dive into Domain 1: Fundamental Concepts and Practical Applications of Artificial Intelligence
Domain 1 of this comprehensive material lays a robust foundation for understanding Artificial Intelligence (AI) and its practical implementation. It systematically breaks down fundamental AI concepts, terminologies, the Machine Learning (ML) development lifecycle, and real-world use cases. This detailed article will explore each task statement and lesson within Domain 1, providing an in-depth overview of the key takeaways. Domain 1 of this comprehensive material lays a robust foundation for understanding Artificial Intelligence (AI) and its practical implementation. It systematically breaks down fundamental AI concepts, terminologies, the Machine Learning (ML) development lifecycle, and real-world use cases. This detailed article will explore each task statement and lesson within Domain 1, providing an in-depth overview of the key takeaways.
Task Statement 1.1: Explaining Basic AI Concepts and Terminologies
This foundational task statement is meticulously divided into five lessons, each building upon the previous one to create a comprehensive understanding of core AI principles.
Lesson 1: Introduction to Artificial Intelligence
The journey begins by defining AI as a field within computer science focused on replicating cognitive abilities commonly associated with human intelligence. These include learning, creativity, and image recognition. The ultimate goal of AI is to develop self-learning systems capable of extracting meaning from data.
The lesson highlights the tangible presence of AI in everyday life through examples like Alexa and ChatGPT, which can respond meaningfully to questions and even generate original content. The ability of AI systems to rapidly process vast datasets is emphasized, showcasing their utility in complex problem-solving such as real-time fraud detection.
Furthermore, AI’s capacity to automate repetitive and monotonous tasks is presented as a significant driver of business efficiency, freeing human employees for more creative endeavors. The power of AI in identifying data patterns and forecasting trends is underscored, enabling businesses to make informed decisions and respond swiftly to challenges.
The lesson then introduces two critical subfields of AI:
- Machine Learning (ML): Described as a branch of AI and computer science, ML centers on using data and algorithms to mimic human learning. ML systems progressively enhance their accuracy by learning from data. These models are trained on large datasets to recognize patterns and make predictions, exemplified by online product recommendations.
- Deep Learning: A specialized type of ML that draws inspiration from the structure of the human brain, employing layers of neural networks to process information. Deep learning excels in tasks such as speech and object/image recognition.
The lesson concludes by illustrating the broad impact of AI across various industries and on customers. Examples span from medical diagnostics using AI to analyze X-rays, to public health agencies like the CDC leveraging AI for pandemic prediction and resource allocation. Industries like manufacturing (e.g., Koch Industries) utilize AI with computer vision for quality control and predictive maintenance.
Customers benefit from AI through enhanced access to product information via chatbots, personalized product recommendations based on shopping history, and tailored content suggestions from streaming services like Discovery. Businesses gain efficiency through more accurate demand forecasting, enabling better resource allocation (e.g., taxi companies). Financial institutions (e.g., MasterCard) employ AI for fraud detection. HR departments use AI for resume processing and candidate matching, boosting hiring manager productivity.
The strategic use of AI for targeted promotions based on customer understanding, as seen with TickeTek’s event recommendations, is also highlighted. The concept of regression analysis, a technique allowing AI models to predict future values based on historical (time series) data, is introduced with the example of a store forecasting staffing needs. These predictions are termed “inferences,” probabilistic results representing educated guesses.
Finally, the lesson touches upon anomaly detection, where AI identifies deviations from expected patterns (e.g., a sudden drop in call center volume). Computer vision applications are presented, showcasing AI’s ability to process images and videos for object identification, facial recognition, classification, recommendation, monitoring, and detection. Advanced applications like identifying missing components on a circuit board are mentioned. The power of AI in language translation, going beyond simple word-for-word conversion to understand context and meaning, is demonstrated with a real-time translation customer support chat. Natural Language Processing (NLP) is identified as the underlying technology enabling machines to understand, interpret, and generate human language naturally, powering devices like Alexa and chatbots for booking services. Generative AI is introduced as the next evolution, capable of engaging in seemingly intelligent conversations and creating original content (text, images, videos, music), exemplified by the Amazon Bedrock song generation example from a user prompt.
Lesson 2: Delving into Machine Learning
This lesson pivots to a more focused exploration of machine learning, defining it as the science of creating algorithms and statistical models that empower computer systems to perform complex tasks without explicit programming.
The core process of ML is described: algorithms process large historical datasets to identify patterns. This starts with a mathematical algorithm taking data (features) as input to produce an output. To train the algorithm for desired outputs, it’s fed known data consisting of these features (columns in a table, pixels in an image). The algorithm continuously learns by analyzing more labeled data, seeking correlations between input features and known outputs.
The model’s internal parameters are adjusted iteratively until it reliably generates the expected output. Once trained, the model can make accurate predictions (inferences) on new, unseen data.
The lesson categorizes the types of data used for ML training:
- Structured Data: The easiest to understand and process, organized into rows and columns (features) in tables. Examples include CSV files and relational databases like Amazon RDS and Amazon Redshift, queried using SQL. For training, this data is typically exported to Amazon S3.
- Semi-structured Data: Doesn’t strictly adhere to tabular formats; data elements can have varying or missing attributes. JSON files are a prime example, with features represented as key-value pairs. Databases like Amazon DynamoDB and Amazon DocumentDB (with MongoDB compatibility) are designed for this type of data, which is also exported to S3 for ML training.
- Unstructured Data: Lacks a specific data model and cannot be stored in tables. Images, videos, text files, and social media posts fall into this category, typically stored as objects in object storage systems like Amazon S3. Features for ML are derived through processing techniques like tokenization (breaking text into units of words or phrases).
- Time Series Data: Crucial for predicting future trends, with each data record timestamped and stored sequentially. Examples include microservice performance metrics (memory usage, CPU percentage, transactions per second). ML models can identify patterns in this data for proactive scaling. Large volumes of time series data are often stored in S3 for model training.
The lesson then uses the example of linear regression (predicting height from weight) to illustrate the concept of an algorithm defining the mathematical relationship between inputs and outputs (h = mw + b). The slope (m) and intercept (b) are identified as the model parameters adjusted during training to find the best-fitting model, which minimizes the errors (distances between data points and the line). Upon completion of training, the model can perform inference (predict a person’s height based on their weight).
Lesson 3: Model Training, Deployment, and Machine Learning Styles
This lesson builds on the previous one by detailing the outcomes of the training process and the options for deploying trained models.
The training process culminates in the creation of model artifacts, which typically include trained parameters, a model definition, and other metadata. These artifacts are usually stored in Amazon S3 and packaged with inference code (the software that implements the model by reading the artifacts) to create a deployable model.
Two primary deployment options are presented:
- Real-time Inference: An endpoint is constantly available to process inference requests with low latency and high throughput. Ideal for online applications where immediate responses are crucial.
- Batch Inference: Suitable for offline processing of large datasets when a persistent endpoint is not required. More cost-effective for high-volume inference tasks where results can be delayed. The key difference is that compute resources for batch processing are only active during the processing, whereas real-time inference requires continuously running resources.
The lesson then explores different styles of machine learning:
- Supervised Learning: Models are trained on pre-labeled data, where the input and the desired output are specified. An example is training a model to identify pictures of fish (labeled “fish”) versus other animals (labeled “not fish”). The model learns by analyzing image pixels and adjusting internal parameters until it accurately identifies fish images. The output of a supervised learning model is typically a probability score. A significant challenge is the need for large amounts of accurately labeled data, which Amazon SageMaker Ground Truth addresses by offering a labeling service leveraging crowdsourcing platforms like Amazon Mechanical Turk.
- Unsupervised Learning: Algorithms are trained on unlabeled data with features, enabling them to discover patterns, group data into clusters, and identify anomalies. Useful for pattern recognition, anomaly detection, and automated data categorization. Setup is simpler due to the lack of labeling requirements. Examples include identifying network traffic patterns for security incident prediction (clustering) and detecting unusual sensor readings in oil wells (anomaly detection).
- Reinforcement Learning: Focuses on autonomous decision-making by an agent interacting with an environment to achieve specific goals. Learning occurs through trial and error based on rewards for actions that move the agent closer to the goal. Labeled input is not required. An example is AWS DeepRacer, where a model car (agent) learns to drive on a track (environment) by being rewarded for staying on the track and completing the course efficiently.
The key distinction between unsupervised and reinforcement learning is that while both operate without labeled data, reinforcement learning has a predefined end goal guiding the exploratory learning process.
Lesson 4: Model Evaluation: Overfitting, Underfitting, and Bias
This lesson shifts focus to the critical aspects of evaluating model performance and potential pitfalls.
Overfitting occurs when a model performs exceptionally well on training data but poorly on new, unseen data. This happens when the model learns the training data too well, including noise (unimportant features), and fails to generalize. The fish example is revisited, where a model trained only on images of fish swimming in water might fail to recognize a fish out of water. The primary solution is to train with more diverse data. Training for excessively long periods can also lead to overfitting by emphasizing noise.
Underfitting is the opposite problem, where the model cannot establish a meaningful relationship between input and output data, resulting in inaccurate predictions on both training and new data. This can be due to insufficient training time or a lack of adequate data. Data scientists aim for the optimal training duration to avoid both underfitting and overfitting.
Bias refers to disparities in a model’s performance across different groups, leading to skewed outcomes for particular classes. An example is a loan application model trained on data lacking diversity, potentially leading to bias against certain demographic groups (e.g., young women in a specific location with otherwise qualifying features). The quality and quantity of underlying data are crucial in mitigating bias. Data scientists can adjust the weight of noise-inducing features or even remove them entirely (e.g., gender consideration). Fairness constraints should be defined upfront, training data should be inspected for potential bias, and models should be continuously evaluated for fairness in their results.
Lesson 5: Deep Learning and Generative AI in Detail
The final lesson in this task statement provides a deeper dive into deep learning and introduces generative AI.
Deep Learning is described as a subset of ML that utilizes neural networks, inspired by the structure of the human brain. These networks consist of layers of software modules called nodes (simulating neurons) including an input layer, hidden layers, and an output layer. Each node autonomously assigns weights to input features. Information flows forward, and during training, the difference between predicted and actual output is used to repeatedly adjust the weights to minimize error.
Deep learning excels in tasks requiring the identification of complex relationships in data, such as image classification and NLP. While the concept of deep learning has existed for some time, the availability of low-cost cloud computing has made the necessary processing power accessible, establishing neural networks as the standard for computer vision. A key advantage is their ability to automatically identify and extract relevant features from images, reducing the need for manual feature engineering. However, training deep learning models often requires vast amounts of data (e.g., millions of images) and significant computational resources, leading to higher infrastructure costs compared to traditional ML. The choice between traditional ML and deep learning depends on the type of data being processed. Traditional ML is generally efficient for structured and labeled data (e.g., classification, recommendation systems), while deep learning is more suitable for unstructured data (images, videos, text) and complex tasks like sentiment analysis. Both use statistical algorithms, but only deep learning employs neural networks. Deep learning models self-learn patterns, reducing feature engineering effort but incurring higher infrastructure costs.
Generative AI is presented as the next frontier, powered by deep learning models pre-trained on massive datasets of text (sequences). They utilize transformer neural networks, which process input sequences (prompts) in parallel to generate output sequences (responses). This parallel processing speeds up training and allows for the use of much larger datasets. Large Language Models (LLMs) contain billions of features, capturing a broad spectrum of human knowledge. Their extensive training makes them highly versatile and superior to other ML approaches in NLP. They excel at understanding human language (summarization), generating human-like text (translation, creative writing), and even understanding and writing computer code. Amazon Bedrock is mentioned as a platform for building generative AI applications. The lesson concludes with a demonstration of using Amazon Bedrock to generate song lyrics from a simple prompt.
Task Statement 1.2: Identifying Practical Use Cases for AI
This task statement, also divided into five lessons, transitions from foundational concepts to exploring the practical application of AI across various scenarios.
Lesson 1: Scenarios Favoring AI and Situations Where It Might Not Be the Best Choice
The lesson begins by highlighting the inherent advantages of AI, such as its ability to work continuously without performance degradation. AI is positioned as a powerful tool for automating repetitive and tedious tasks, thereby reducing employee workloads and streamlining business operations. Its ability to analyze vast amounts of high-velocity data and recognize patterns makes it ideal for complex problems like fraud detection and demand forecasting, leading to improved decision-making and efficiency.
However, the lesson also cautions that AI is not a universal solution. Several scenarios where AI might not be the optimal choice are discussed:
- High Resource Consumption: Training ML models requires significant computational resources, which can be costly, and models may need frequent retraining. A thorough cost-benefit analysis is crucial before embarking on an AI project, ensuring that the anticipated business benefits (e.g., fraud and waste reduction) outweigh the development and maintenance costs.
- Model Interpretability: Complex neural networks, while powerful, often lack transparency in how they arrive at predictions, a challenge known as interpretability. If business or regulatory requirements mandate complete transparency in decision-making, less complex models or even rule-based systems might be preferable, often at the cost of lower performance.
- Deterministic Output Requirement: Machine learning models are probabilistic, meaning they predict the likelihood of an event and can produce varying results for identical inputs due to their learning process and incorporated randomness. If a deterministic system is required (where the same input always yields the same output), a rule-based system is a better fit. An example provided is an automatic loan approval rule based on a fixed credit score threshold.
Lesson 2: Identifying Different ML Problem Types
This lesson focuses on classifying different types of machine learning problems based on the nature of the data and the desired outcome.
- Supervised Learning Problems: Characterized by datasets with input features and labeled target values (outputs). The goal is to train a model to predict the target value for new, unseen inputs.
- Classification Problems: The target values are categorical (discrete values).
- Binary Classification: Assigns an input to one of two mutually exclusive classes (e.g., disease/no disease, fish/not fish).
- Multiclass Classification: Assigns an input to one of several predefined classes (e.g., topic of a tax document, different types of sea creatures).
- Regression Problems: The target values are mathematically continuous. The model estimates the value of a dependent variable based on one or more independent variables.
- Linear Regression: A direct linear relationship between inputs and output (simple with one independent variable, multiple with several). Example: predicting house prices based on features.
- Logistic Regression: Used to predict the probability of an event occurring (output between 0 and 1) using logarithmic functions. Examples: predicting heart disease risk based on BMI, smoking status, etc., or identifying fraudulent financial transactions.
- Classification Problems: The target values are categorical (discrete values).
- Unsupervised Learning Problems: Involve datasets with input features but no labels or target values. The goal is to discover inherent patterns in the data.
- Clustering Problems: Grouping data into discrete clusters based on similarity of features. Example: segmenting customers by purchase history.
- Anomaly Detection Problems: Identifying rare or suspicious items, events, or observations that deviate significantly from the rest of the data. Example: detecting failing sensors or medical errors.
The lesson emphasizes the need for significant amounts of labeled data for both linear and logistic regression models to achieve accurate predictions. Cluster analysis relies on defining features, selecting a distance function for similarity measurement, and specifying the desired number of clusters. Anomaly detection aims to identify outliers that raise suspicion due to their difference from the norm.
Lesson 3: Leveraging Pre-trained AI Services on AWS
This lesson highlights the availability of pre-trained AI services on AWS accessible through APIs, advocating for their consideration before embarking on building custom models.
Amazon Rekognition is presented as a pre-trained deep learning service for computer vision, catering to common use cases like face recognition (verifying identity, finding individuals in images/videos), object and label detection (making image/video libraries searchable, security alerts), custom object recognition (training on proprietary objects), text detection in images, and content moderation (detecting explicit, inappropriate, or violent content).
Amazon Textract is introduced as a service for extracting text, handwriting, forms, and tabular data from scanned documents, going beyond simple OCR.
Amazon Comprehend is described as an NLP service for discovering insights and relationships in text, with common use cases including sentiment analysis of customer feedback and PII (Personally Identifiable Information) detection. The synergy between Amazon Textract and Comprehend is noted, where Textract extracts text for Comprehend to analyze.
Amazon Lex helps build voice and text interfaces (chatbots, IVR systems) using the same technology as Amazon Alexa.
Amazon Transcribe is an automatic speech recognition service supporting over 100 languages, designed for high-quality transcriptions of live and recorded audio/video for search and analysis (e.g., real-time captioning).
Lesson 4: More AWS Pre-trained AI Services and Generative AI on Bedrock
This lesson continues the overview of AWS pre-trained AI services.
Amazon Polly converts text to natural-sounding speech in multiple languages using deep learning, enabling applications like audio versions of articles and prompts in IVR systems, improving product engagement and accessibility.
Amazon Kendra uses ML for intelligent search across enterprise systems, understanding natural language questions to quickly find relevant content.
Amazon Personalize automatically generates personalized recommendations for customers in retail, media, and entertainment (e.g., “You might also like” sections in e-commerce apps), enabling more effective marketing campaigns through customer segmentation.
Amazon Translate provides fluent text translation between numerous languages using neural networks that consider the entire context for more accurate and natural-sounding results (e.g., real-time chat translation).
Amazon Fraud Detector identifies potentially fraudulent online activities (payment fraud, fake accounts) using pre-trained models for various scenarios.
Amazon Bedrock is introduced as a fully managed service for building generative AI applications, offering a choice of high-performing foundation models from Amazon, Meta, and AI startups. It supports customization through fine-tuning with proprietary data or creating knowledge bases for the model to query, a process called Retrieval Augmented Generation (RAG). An example of using the Titan Image Generator on Bedrock is provided.
Amazon SageMaker is positioned for scenarios requiring more customized ML models or workflows beyond the capabilities of the core AI services, offering tools for data preparation, building, training, and deploying high-quality ML models efficiently, including pre-trained models in SageMaker JumpStart for accelerating development through transfer learning.
Lesson 5: Real-World AI Applications
The final lesson in this task statement presents concrete examples of AI in action across different industries.
- MasterCard: Utilizes AI with Amazon SageMaker for real-time fraud detection, significantly increasing detection rates and reducing false positives. Generative AI has been integrated (using LLMs and transaction history as prompts) to further enhance fraud detection by predicting the likelihood of a customer visiting a particular business.
- DoorDash: Replaced an outdated IVR system with one powered by Amazon Lex for natural language processing, allowing customers to speak instead of using touch tones, improving customer experience, reducing hold times, and increasing self-service adoption.
- Laredo Petroleum: Implemented a data streaming solution on AWS and built ML models with Amazon SageMaker to monitor real-time sensor data from oil and gas wells, enabling proactive maintenance, reducing environmental impact by detecting and remediating potential issues like gas flaring and leaks.
- Booking.com: Leverages Amazon SageMaker to build ML models for booking recommendations, managing vast amounts of data. They’ve created the AI Trip Planner using generative AI (and RAG by calling the booking recommendation API and retrieving customer reviews) to engage with customers in natural language and provide more accurate and current recommendations.
- Pinterest: Uses AI for its Pinterest Lens feature, which allows users to take a picture of an object and instantly find similar items for sale. They maintain a large collection of labeled product images in Amazon S3 and frequently retrain their ML models, using Amazon Mechanical Turk and SageMaker Ground Truth for image labeling.
These examples underscore the transformative power of AI in solving real-world business problems, enhancing customer experiences, and driving operational efficiencies.
Task Statement 1.3: Describing the ML Development Lifecycle
This task statement delves into the systematic process of developing and deploying machine learning models, spread across seven detailed lessons.
Lesson 1: Introduction to the ML Development Lifecycle and Initial Steps
The lesson introduces the Machine Learning Pipeline as a series of interconnected steps starting with a business goal and culminating in the operation of a deployed ML model. These steps include defining the problem, collecting and preparing data, training the model, deploying it, and continuously monitoring its performance. The iterative nature of this process is emphasized, leading many to consider it an ML Lifecycle with repeated phases even after deployment.
The first critical step is identifying the business goal. Organizations must have a clear understanding of the problem to be solved and the measurable business value to be gained, aligned with specific objectives and success criteria. Without these, evaluating the model or determining the suitability of ML becomes impossible. Stakeholder alignment is crucial. The target should be achievable and have a clear path to production.
The lesson stresses the importance of determining if ML is the appropriate approach by evaluating all available options, considering accuracy, cost, and scalability. Ensuring the availability of sufficient, relevant, and high-quality training data is also paramount. The ML question should be formulated in terms of input, desired outputs, and the performance metric to be optimized. The simplest solution should always be explored first.
A cost-benefit analysis is essential before proceeding to the next phase. The lesson highlights AWS’s AI services as democratizing ML, offering pre-trained, fully hosted models for common use cases, which should be evaluated first. Many of these services allow for customization (e.g., custom classifiers in Amazon Comprehend). If a hosted service doesn’t meet the objectives, building upon an existing model (e.g., fine-tuning foundation models on Amazon Bedrock or using pre-trained models in SageMaker JumpStart) should be considered before the most complex and costly option of training a model from scratch. SageMaker JumpStart is further described as providing pre-trained AI foundation models and task-specific models that can be fine-tuned with custom datasets using transfer learning, offering significant cost and time savings.
Lesson 2: Data Collection and Preparation
This lesson details the crucial stages of collecting and preparing data for ML model training.
The process begins with identifying the necessary data and determining data collection options (streaming or batch). An Extract, Transform, Load (ETL) process needs to be configured to gather data from potentially multiple sources and store it in a centralized repository. Given the need for frequent model retraining, this process must be repeatable. Determining if the data is already labeled or how it will be labeled is a significant part of this stage.
Data preparation encompasses data pre-processing (cleaning, handling missing/anomalous values, masking/removing PII) and feature engineering (selecting and potentially combining relevant characteristics as features for training). Exploratory Data Analysis (EDA) with visualization tools aids in understanding the data. The data is typically split into three datasets: training (e.g., 80%), validation/evaluation (e.g., 10%), and test (e.g., 10%). Feature reduction to only necessary features for inference helps minimize memory and computing power requirements during training.
The lesson then introduces several AWS services for data ingestion and preparation:
- AWS Glue: A fully managed ETL service for creating and running ETL jobs. It discovers data stored on AWS, stores metadata (schema) in the AWS Glue Data Catalog, and generates code for data transformations and loading. AWS Glue has built-in transformations and can handle various data stores (relational databases, data warehouses, cloud/streaming services like Amazon MSK and Kinesis). Crawlers can automatically determine data schema using classifiers and populate the Data Catalog. ETL jobs use this information to transform and store data in target data stores (typically S3).
- AWS Glue DataBrew: A visual data preparation tool for cleaning and normalizing data without code. It allows for interactive discovery, visualization, cleaning, and transformation of raw data with smart suggestions for data quality issues. Transformation steps can be saved as reusable recipes. DataBrew offers over 250 built-in transformations and can evaluate data quality through rule sets and profiling jobs.
- Amazon SageMaker Ground Truth: Helps build high-quality labeled datasets for supervised learning using active learning (ML model-assisted labeling) and human workforces (Amazon Mechanical Turk or internal private workforces).
- Amazon SageMaker Canvas: A visual tool for preparing, featurizing, and analyzing data. It simplifies feature engineering with a single visual interface, allowing users to select raw data from various sources and import it with a single click. It includes over 300 built-in transformations for normalization, transformation, and combination of features without coding.
- Amazon SageMaker Feature Store: A centralized repository for features and associated metadata, facilitating discovery and reuse. It streamlines ML development by reducing repetitive data processing and curation required to convert raw data into features for training. Workflow pipelines can be created to convert raw data into features and add them to feature groups.
Lesson 3: Model Training, Tuning, and Evaluation
This lesson focuses on the iterative process of training, tuning, and evaluating ML models.
Training involves the ML algorithm iteratively updating parameters (weights) to minimize the difference between the model’s inference and the expected output. This continues until a defined number of iterations or a target error reduction is achieved. Running parallel training jobs with different algorithms and settings (experiments) is best practice for finding the optimal solution.
Hyperparameters, external parameters that influence an algorithm’s performance (e.g., number of neural layers), are set by data scientists before training. Their optimal values are typically determined through multiple experiments with varying settings.
To train a model using Amazon SageMaker, a training job is created, which runs training code on SageMaker-managed ML compute instances. This requires specifying the S3 URL of the training data, the desired compute resources, the output S3 bucket for model artifacts, and the algorithm (via a Docker container image path in Amazon ECR, which can be a SageMaker-provided algorithm, a deep learning container, or a custom container). Hyperparameters also need to be configured. SageMaker then launches the instances, trains the model, and saves the artifacts in the specified S3 bucket.
Amazon SageMaker Experiments is introduced as a tool for managing, analyzing, and comparing ML experiments (groups of training runs with different inputs, parameters, and configurations) through a visual interface to identify the best-performing models.
Amazon SageMaker Automatic Model Tuning (AMT), also known as hyperparameter tuning, automates the process of finding the best model version by running multiple training jobs with different hyperparameter combinations within specified ranges, optimizing for a chosen performance metric (e.g., AUC for a binary classification model). AMT continues until predefined completion criteria are met (e.g., a certain number of jobs without significant metric improvement).
Lesson 4: Model Deployment Options
This lesson details the various ways a trained and evaluated model can be deployed for making inferences.
The first decision is whether to use batch inference (offline processing for large datasets when results can be delayed) or real-time inference (immediate responses to requests, e.g., in generative AI applications via REST APIs). In both cases, inference code and model artifacts are typically deployed as Docker containers, which can run on various AWS compute resources (AWS Batch, Amazon ECS, Amazon EKS, AWS Lambda, Amazon EC2, etc.). Managing the inference endpoint (updates, scalability, security) is a consideration with these options.
For reduced operational overhead, Amazon SageMaker offers fully managed hosted endpoints. To use SageMaker inference, users point to model artifacts in S3 and a Docker container image in ECR, select the inference option (batch, asynchronous, serverless, or real-time), and SageMaker creates the endpoint and deploys the model code. Real-time, asynchronous, and batch inference on SageMaker utilize EC2 ML instances (potentially in Auto Scaling Groups), while serverless inference uses AWS Lambda functions. SageMaker also provides an inference recommender tool to help select the optimal configuration.
Amazon SageMaker supports four inference option types:
- Batch Transform: Offline inference for large datasets (gigabytes in size) when a persistent endpoint is not needed and results can be delayed.
- Asynchronous Inference: Ideal for queuing requests with large payloads and longer processing times, with the ability to scale the endpoint down to zero during idle periods.
- Serverless Inference: Real-time inference without direct provisioning of compute instances or scaling policies, using Lambda functions and billed only when functions are running or pre-provisioned, suitable for models with intermittent traffic.
- Real-time Inference: For workloads requiring immediate interactive responses via a persistent, fully managed REST API backed by the chosen EC2 ML instance type, always available to process requests.
Lesson 5: Model Monitoring and MLOps
This lesson focuses on the crucial final stage of the ML lifecycle: monitoring deployed models and the principles of MLOps.
Model monitoring is essential because model performance can degrade over time due to factors like data quality, model quality, and bias. A monitoring system should capture data, compare it to the training set, define rules to detect issues (data drift, concept drift), and send alerts, repeating on a schedule or triggered by events or human intervention. For most models, a simple scheduled retraining (daily, weekly, monthly) is sufficient. The monitoring system should trigger alerts to an alarm manager system, potentially initiating an automatic retraining cycle upon detecting drift.
Data drift refers to significant changes in the data distribution compared to the training data. Concept drift occurs when the properties of the target variables change. Both lead to performance degradation.
Amazon SageMaker Model Monitor automatically monitors models in production, detects errors by comparing endpoint data against a baseline using built-in or user-defined rules, and sends alerts. Results are viewable in Amazon SageMaker Studio and also sent to Amazon CloudWatch for configuring alarms and triggering remedial actions like retraining.
The importance of automation in ML pipelines is then discussed, leading to the concept of MLOps, which applies established software engineering best practices to machine learning development. MLOps aims to automate manual tasks, testing, code evaluation before release, and incident response, streamlining model delivery across the ML lifecycle. The cloud’s API-based services treat everything as software, enabling infrastructure to be defined in code and deployed/redeployed repeatably.
Key principles of MLOps include version control for tracking lineage (including training data), monitoring deployments for issues, and automating retraining for issues or data/code changes. Benefits include increased productivity (self-service environments), repeatability (automating all lifecycle steps), improved reliability (quality and consistency in deployment), enhanced compliance (versioning all inputs and outputs for auditability), and improvements in data and model quality (enforcing policies against bias, tracking data and model quality).
Amazon SageMaker Pipelines is presented as a service for orchestrating SageMaker jobs and authoring reproducible ML pipelines. Pipelines can deploy custom models for real-time or batch inference and track artifact lineage. They allow for implementing sound operational practices in deployment and monitoring. Pipelines can be created using the SageMaker SDK for Python or defined in JSON, encompassing all steps to build and deploy a model, including conditional branches. They can be visualized in SageMaker Studio. An example pipeline for a model inferring abalone age is provided.
Lesson 6: Additional MLOps Services and Classification Metrics
This lesson continues the discussion of MLOps with additional services and introduces metrics for evaluating classification models.
Several other services relevant to MLOps are highlighted:
- Repositories: For storing versions of code and models. SageMaker Feature Store for feature definitions, and SageMaker Model Registry as a central repository for trained models and their history.
- Orchestration Tools: Beyond SageMaker Pipelines, AWS Step Functions (visual workflow definition with drag-and-drop) and Amazon Managed Workflows for Apache Airflow (MWAA) (using Apache Airflow and Python for programmatic workflow creation without infrastructure management).
The lesson then shifts to metrics for evaluating classification models, starting with the confusion matrix. For a binary classification model (yes/no, positive/negative), a confusion matrix is a table showing actual vs. predicted outcomes:
- True Positive (TP): Model correctly predicts positive.
- True Negative (TN): Model correctly predicts negative.
- False Negative (FN): Model incorrectly predicts negative (actual is positive).
- False Positive (FP): Model incorrectly predicts positive (actual is negative).
An example with 100 test images for a fish classification model is used to illustrate.
Accuracy, the percentage of correct predictions, is introduced as (TP + TN) / Total Predictions. While understandable, it’s not a good metric for imbalanced datasets (where one class significantly outweighs the other). An example is given where a model that always predicts “fish” in a dataset with 90% fish images achieves 90% accuracy despite being unhelpful.
Precision measures the proportion of true positives among all predicted positives: TP / (TP + FP). It’s useful when minimizing false positives is the goal (e.g., not labeling a legitimate email as spam). The precision for the fish model in the example is calculated.
Recall (also known as sensitivity or true positive rate) measures the proportion of true positives among all actual positives: TP / (TP + FN). It’s important when minimizing false negatives is the priority (e.g., not missing a disease diagnosis). The recall for the fish model is also calculated. There’s a trade-off between precision and recall.
The F1 Score balances precision and recall, combining them into a single metric: 2 * (Precision * Recall) / (Precision + Recall). It’s a good compromise when both false positives and false negatives are important. The fish model in the example has better recall than precision, suggesting it’s better at detecting actual fish but might have more false positives. In such scenarios, optimizing the F1 score is often the best approach.
Lesson 7: Regression Metrics and Business Metrics
The final lesson of Domain 1 continues discussing model evaluation with metrics for regression models and emphasizes the importance of business metrics.
The False Positive Rate (FPR) is introduced as FP / (FP + TN), showing how the model handles negative instances (e.g., how many non-fish images were incorrectly classified as fish).
The True Negative Rate (TNR) is TN / (FP + TN), measuring how many negative instances were correctly predicted as negative (e.g., how many non-fish images were correctly classified).
The Area Under the Curve (AUC) metric is used for comparing and evaluating binary classification algorithms that output probabilities (like logistic regression). Probabilities are mapped to binary predictions using a threshold. The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate against the False Positive Rate for varying threshold values. AUC represents the area under this curve, providing an aggregated measure of model performance across all possible thresholds. AUC scores range from 0 to 1, with 1 indicating perfect accuracy and 0.5 indicating performance no better than random. Increasing the threshold generally reduces false positives but increases false negatives.
For linear regression models, where a line is fitted to data points, the error is the distance between the line and actual values. Common evaluation metrics include:
- Mean Squared Error (MSE): The average of the squared differences between predictions and actual values. Always positive, lower values indicate better prediction.
- Root Mean Squared Error (RMSE): The square root of the MSE. Its units match the dependent variable, making it more interpretable than MSE. Both MSE and RMSE emphasize the impact of outliers due to the squaring of errors.
- Mean Absolute Error (MAE): The average of the absolute values of the errors, less sensitive to outliers than MSE and RMSE.
Finally, the lesson underscores the critical role of business metrics. The first step in the ML pipeline is defining the business goal, which then dictates how success will be measured. Business metrics quantify the value of an ML model to the business, such as cost reduction, increased users/sales, improved customer feedback, or any other relevant KPI. It’s also crucial to estimate the risks and potential costs of errors. After deployment, data must be collected to track these metrics and compare actual results against initial goals, including the cost of building and operating the model to calculate the return
More resources
- A Framework to Mitigate Bias and Improve Outcomes in the New Age of AI
https://aws.amazon.com/blogs/publicsector/framework-mitigate-bias-improve-outcomes-new-age-ai/
- What Are Transformers in Artificial Intelligence?
https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/
- What Is Overfitting?
https://aws.amazon.com/what-is/overfitting/
- What Are Large Language Models (LLMs)?
https://aws.amazon.com/what-is/large-language-model/
- Responsible Use of Machine Learning
https://d1.awsstatic.com/responsible-machine-learning/responsible-use-of-machine-learning-guide.pdf
- Easily Add Intelligence to Your Applications
https://aws.amazon.com/ai/services/
- What Is MLOps?
https://aws.amazon.com/what-is/mlops/
- Amazon SageMaker MLOps: From Idea to Production in Six Steps
https://catalog.us-east-1.prod.workshops.aws/workshops/741835d3-a2bf-4cb6-81f0-d0bb4a62edca/en-US
- Machine Learning Lens
https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/machine-learning-lens.html