The Azure Data Scientist Associate Training (DP-100) prepares participants to design and implement AI solutions on Azure. Below are the topics covered in this course.
Topics
Skill: Design and Prepare a Machine Learning Solution
Part 1: Designing a Machine Learning Solution
Session 1: Dataset Preparation
- Identify the structure and format for datasets
- Determine compute specifications for machine learning workloads
Session 2: Development Approach
- Select the development approach to train a model
Part 2: Create and Manage Resources in Azure Machine Learning Workspace
Session 3: Workspace Management
- Create and manage a workspace
- Create and manage data stores
Session 4: Compute Targets and Source Control
- Create and manage compute targets
- Set up Git integration for source control
Part 3: Create and Manage Assets in Azure Machine Learning Workspace
Session 5: Data and Environment Management
- Create and manage data assets
- Create and manage environments
Session 6: Asset Sharing
- Share assets across workspaces using registries
Skill: Explore Data and Run Experiments
Part 1: Automated Machine Learning
Session 7: Model Exploration
- Use automated machine learning for tabular data, computer vision, and natural language processing
- Select and understand training options, including preprocessing and algorithms
- Evaluate automated machine learning runs using responsible AI guidelines
Part 2: Custom Model Training with Notebooks
Session 8: Notebook Training
- Use the terminal to configure a compute instance
- Access and wrangle data in notebooks
- Retrieve features from a feature store to train a model
- Track model training with MLflow
- Evaluate a model with responsible AI guidelines
Session 9: Interactive Data Wrangling
- Wrangle data interactively with attached Synapse Spark pools and serverless Spark compute
Part 3: Hyperparameter Tuning
Session 10: Automating Tuning
- Select a sampling method
- Define the search space and primary metric
- Define early termination options
Skill: Train and Deploy Models
Part 1: Model Training
Session 11: Running Training Scripts
- Consume data in a job
- Configure compute and environment for a job run
- Track model training with MLflow in a job run
- Define parameters for a job and run a script as a job
- Use logs to troubleshoot job run errors
Part 2: Implement Training Pipelines
Session 12: Pipeline Development
- Create custom components and pipelines
- Pass data between steps in a pipeline
- Run, schedule, and troubleshoot pipeline runs
Part 3: Model Management
Session 13: Managing Model Artifacts
- Define the signature in the MLmodel file
- Package a feature retrieval specification with the model artifact
- Register an MLflow model
- Assess a model using responsible AI principles
Part 4: Model Deployment
Session 14: Online and Batch Deployment
- Configure settings for online deployment and deploy a model to an online endpoint
- Test an online deployed service
- Configure compute for a batch deployment
- Deploy a model to a batch endpoint and invoke the batch endpoint for scoring jobs
Skill: Optimize Language Models for AI Applications
Part 1: Preparation for Model Optimization
Session 15: Language Model Deployment
- Select and deploy a language model from the catalog
- Compare language models using benchmarks
- Test a deployed language model in the playground
Part 2: Optimization Approaches
Session 16: Prompt Engineering and Prompt Flow
- Test prompts with manual evaluation
- Define and track prompt variants
- Create prompt templates and define chaining logic with the Prompt Flow SDK
- Use tracing to evaluate the flow
Session 17: Retrieval Augmented Generation (RAG)
- Prepare data for RAG, including cleaning, chunking, and embedding
- Configure a vector store and an Azure AI Search-based index store
- Evaluate the RAG solution
Session 18: Fine-Tuning
- Prepare data for fine-tuning
- Select an appropriate base model
- Run a fine-tuning job and evaluate the fine-tuned model