Overview of the ML Workflow

The machine learning workflow you’ve outlined is accurate and represents the standard pipeline in ML projects. The complete flow is:

Data → Feature Representation → Model Family → Parameters → Prediction Function → Loss → Optimization → Regularization → Training Procedure → Evaluation → Inference

This is indeed the core workflow, though in practice it’s often iterative with feedback loops back to earlier stages based on evaluation results.

1. Data

The foundation of any ML project. This stage involves acquiring, understanding, and preparing raw information.

Key Concepts and Techniques:

Data Collection & Sources

  • Web scraping
  • APIs and databases
  • Sensors and IoT devices
  • User-generated content
  • Public datasets (Kaggle, UCI Machine Learning Repository, etc.)

Data Types

  • Structured data (tabular, relational databases)
  • Unstructured data (text, images, audio, video)
  • Time series data
  • Graph data
  • Semi-structured data (JSON, XML)

Data Quality Assessment

  • Missing values detection
  • Outlier identification
  • Data validation
  • Class imbalance analysis
  • Data profiling and statistical summaries

Data Cleaning

  • Handling missing values (deletion, imputation, forward-fill)
  • Removing duplicates
  • Correcting inconsistencies
  • Standardizing formats
  • Removing irrelevant features

Data Splitting

  • Train/Validation/Test split (typically 70/15/15 or 80/20)
  • Stratified sampling (for imbalanced classes)
  • Time-based splits (for time series)
  • K-fold cross-validation preparation
  • Group-based splits (for grouped data)

2. Feature Representation

The process of transforming raw data into meaningful features that algorithms can learn from effectively.

Key Concepts and Techniques:

Feature Extraction

  • Dimensionality reduction techniques (PCA, ICA, Factor Analysis)
  • Text feature extraction (TF-IDF, Word2Vec, BERT embeddings)
  • Image feature extraction (SIFT, HOG, CNN features)
  • Audio feature extraction (MFCC, spectrograms)
  • Domain-specific feature extraction

Feature Engineering

  • Polynomial features
  • Interaction terms
  • Domain-driven features
  • Temporal features (day, month, hour from timestamps)
  • Statistical features (mean, variance, skewness)
  • Binning and bucketing
  • One-hot encoding for categorical variables
  • Ordinal encoding
  • Target encoding
  • Frequency encoding

Feature Normalization & Scaling

  • Standardization (z-score normalization)
  • Min-Max scaling (normalization)
  • Robust scaling (resistant to outliers)
  • Log scaling
  • Unit vector scaling (L2 normalization)
  • Quantile transformation

Feature Selection

  • Filter methods (correlation, chi-square, ANOVA)
  • Wrapper methods (RFE - Recursive Feature Elimination)
  • Embedded methods (L1/L2 regularization, tree feature importance)
  • Univariate statistical tests
  • Mutual information
  • Permutation importance

Handling Categorical Variables

  • One-hot encoding
  • Label encoding
  • Ordinal encoding
  • Binary encoding
  • Hashing
  • Embeddings (for high-cardinality features)

Feature Creation from Raw Data

  • Text: word counts, n-grams, sentiment scores
  • Time series: lags, rolling statistics, fourier features
  • Images: color histograms, texture descriptors
  • Mixed: cross-features, aggregations

Tools/Frameworks: Scikit-learn’s feature tools (StandardScaler, PCA, feature selectors); TensorFlow/Keras preprocessing layers; libraries like Featuretools for automated feature engineering.

3. Model Family

The choice of which type of algorithm architecture to use for your problem.

Key Concepts and Techniques:

Regression Models (for continuous output)

  • Linear Regression
  • Polynomial Regression
  • Ridge Regression
  • Lasso Regression
  • Elastic Net
  • Support Vector Regression (SVR)
  • Decision Trees (Regression)
  • Random Forest (Regression)
  • Gradient Boosting Regression
  • Neural Networks (Regression output)

Classification Models (for categorical output)

  • Logistic Regression
  • Decision Trees
  • Random Forest
  • Support Vector Machines (SVM)
  • Naive Bayes
  • K-Nearest Neighbors (KNN)
  • Gradient Boosting Classifiers (XGBoost, LightGBM, CatBoost)
  • Neural Networks (Classification output)

Ensemble Methods

  • Bagging (Bootstrap Aggregating)
  • Boosting (AdaBoost, Gradient Boosting)
  • Stacking
  • Voting Classifiers/Regressors
  • Blending

Deep Learning Models

  • Convolutional Neural Networks (CNNs)
  • Recurrent Neural Networks (RNNs, LSTM, GRU)
  • Transformer models
  • Autoencoders
  • Generative models (GANs, VAE)

Unsupervised Learning Models

  • K-Means Clustering
  • Hierarchical Clustering
  • DBSCAN
  • Gaussian Mixture Models (GMM)
  • Principal Component Analysis (PCA)
  • Isolation Forest (anomaly detection)

Specialized Models

  • Time Series: ARIMA, Prophet, SARIMA
  • Recommendation: Collaborative Filtering, Matrix Factorization
  • Ranking: Learning to Rank methods
  • Reinforcement Learning: Q-Learning, Policy Gradient

Model Selection Criteria

  • Problem type (regression vs classification vs clustering)
  • Data size and dimensionality
  • Interpretability requirements
  • Computational constraints
  • Latency requirements
  • Training time considerations

4. Parameters

The specific configuration and hyperparameters of the chosen model family.

Key Concepts and Techniques:

Model Hyperparameters (set before training)

  • Learning rate (for gradient-based optimization)
  • Number of hidden layers and units (for neural networks)
  • Tree depth and node splitting criteria (for tree-based models)
  • Kernel type and kernel coefficient (for SVM)
  • Regularization strength (C, alpha)
  • Number of neighbors K (for KNN)
  • Number of estimators/trees
  • Batch size
  • Number of epochs
  • Dropout rate
  • Activation functions
  • Optimizer type

Model Parameters (learned during training)

  • Weights and biases (for neural networks)
  • Tree structure and split thresholds
  • Support vectors and their coefficients (for SVM)
  • Feature coefficients (for linear models)

Hyperparameter Tuning Methods

  • Grid Search (exhaustive search over specified parameter values)
  • Random Search (random sampling of parameter space)
  • Bayesian Optimization
  • Hyperband
  • Optuna
  • Tree-structured Parzen Estimator (TPE)
  • Genetic algorithms
  • Manual tuning with domain expertise

Parameter Ranges and Defaults

  • Understanding default values
  • Common ranges for popular models
  • Parameter dependencies
  • Parameter scaling (linear vs logarithmic scale)

5. Prediction Function

The mathematical function that maps input features to predictions using the model parameters.

Key Concepts and Techniques:

Linear Prediction Functions

  • ( y = w^T x + b ) (linear regression/classification)
  • Weighted sum of features

Non-linear Prediction Functions

  • Polynomial: ( y = \beta_0 + \sum \beta_i x_i + \sum \beta_{ij} x_i x_j + … )
  • Neural Network: compositions of non-linear activation functions
  • RBF (Radial Basis Function): ( f(x) = \sum_{i=1}^{n} w_i \phi(||x - x_i||) )
  • Decision Trees: hierarchical if-then-else rules

Output Transformations

  • Sigmoid function (for binary classification): ( \sigma(z) = \frac{1}{1 + e^{-z}} )
  • Softmax function (for multi-class): ( p_i = \frac{e^{z_i}}{\sum_j e^{z_j}} )
  • Tanh activation
  • ReLU and variants
  • Linear output (for regression)

Prediction Types

  • Point predictions (single value)
  • Probability predictions (confidence scores)
  • Quantile predictions (for quantile regression)
  • Interval predictions (prediction intervals)
  • Ranked predictions (for ranking tasks)

Ensemble Predictions

  • Averaging (mean of predictions)
  • Weighted averaging
  • Majority voting (classification)
  • Stacking with meta-learners
  • Blending combinations

6. Loss Function

Quantifies the discrepancy between predicted and actual values, guiding the model’s learning.

Key Concepts and Techniques:

Regression Loss Functions

  • Mean Squared Error (MSE): ( L = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 )
    • Sensitive to outliers, differentiable
  • Mean Absolute Error (MAE): ( L = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| )
    • Robust to outliers, less smooth
  • Root Mean Squared Error (RMSE): ( L = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} )
    • Interpretable scale
  • Huber Loss: Combination of MSE and MAE, robust yet smooth
  • Log-Cosh Loss: Smooth approximation of MAE
  • Quantile Loss: For quantile regression

Classification Loss Functions

  • Binary Cross-Entropy (Log Loss): ( L = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)] )
    • For binary classification
  • Categorical Cross-Entropy: ( L = -\frac{1}{n} \sum_{i=1}^{n} \sum_{k=1}^{K} y_{ik} \log(\hat{y}_{ik}) )
    • For multi-class classification
  • Focal Loss: Addresses class imbalance by down-weighting easy examples
  • Hinge Loss: ( L = \sum_{i=1}^{n} \max(0, 1 - y_i \hat{y}_i) )
    • Used in SVM
  • Squared Hinge Loss: Smoother version of hinge loss

Specialized Loss Functions

  • Triplet Loss: For metric learning and embeddings
  • Contrastive Loss: For similarity learning
  • Ranking Loss: For ranking problems (pairwise and listwise)
  • Ordinal Loss: For ordinal regression
  • AUC Loss: Directly optimizes for AUC

Weighted Loss Functions

  • Class weights (to handle imbalanced datasets)
  • Sample weights
  • Focal weight (down-weight easy negatives)

Custom Loss Functions

  • Business-specific metrics as loss
  • Multi-task learning losses
  • Adversarial losses (for GANs)

7. Optimization

The algorithm used to minimize the loss function and find optimal model parameters.

Key Concepts and Techniques:

Gradient-Based Optimization

  • Gradient Descent Variants
    • Batch Gradient Descent (BGD): Uses entire dataset per update
    • Stochastic Gradient Descent (SGD): Uses single sample per update
    • Mini-batch Gradient Descent: Uses subset of data per update
    • Cycle learning rates

Adaptive Learning Rate Methods

  • Momentum: Accelerates convergence by accumulating gradient direction
    • Standard Momentum (SGD with momentum)
    • Nesterov Accelerated Gradient (NAG)
  • Adagrad: Adapts learning rate based on historical gradients
  • RMSprop: Modifies Adagrad with exponential moving average
  • Adam: Combines momentum and RMSprop (most popular)
  • AdaMax: Variant of Adam
  • Nadam: Nesterov-accelerated Adam
  • AdamW: Adam with decoupled weight decay

Momentum Accumulation

  • Exponential moving average of gradients
  • Velocity and friction concepts
  • Gradient clipping

Learning Rate Scheduling

  • Fixed learning rate
  • Step decay (reduce at fixed intervals)
  • Exponential decay
  • Polynomial decay (linear, quadratic, cubic)
  • Cosine annealing
  • Warm restarts
  • Adaptive scheduling based on validation metrics

Second-Order Methods

  • Newton’s Method
  • Quasi-Newton (BFGS, L-BFGS)
  • Natural Gradient Descent
  • Hessian-free optimization

Gradient Descent Challenges

  • Local minima vs global minima
  • Saddle points
  • Vanishing and exploding gradients
  • Plateau regions

Alternative Optimization Methods

  • Coordinate Descent
  • Alternating Least Squares (ALS)
  • Expectation-Maximization (EM)
  • Simulated Annealing
  • Genetic Algorithms
  • Particle Swarm Optimization

Distributed Optimization

  • Data parallelism
  • Model parallelism
  • Asynchronous SGD
  • Parameter servers
  • Federated learning

8. Regularization

Techniques to prevent overfitting and improve model generalization.

Key Concepts and Techniques:

Parameter Norm Penalties

  • L1 Regularization (Lasso): ( L_{total} = L_{original} + \lambda \sum |w_i| )
    • Encourages sparsity (feature selection)
    • Used in Lasso regression and SVM
  • L2 Regularization (Ridge): ( L_{total} = L_{original} + \lambda \sum w_i^2 )
    • Penalizes large weights, smoother solutions
    • Used in Ridge regression
  • Elastic Net: Combination of L1 and L2
    • ( L_{total} = L_{original} + \lambda_1 \sum |w_i| + \lambda_2 \sum w_i^2 )
  • Regularization strength ((\lambda)): Controls trade-off between fit and complexity

Dropout

  • Randomly deactivating neurons during training
  • Effective for neural networks
  • Dropout rate as hyperparameter (typically 0.2-0.5)
  • Variants: standard dropout, spatial dropout, recurrent dropout

Early Stopping

  • Monitor validation performance
  • Stop training when validation loss plateaus or increases
  • Preserves weights from best iteration
  • Prevents overfitting without explicit penalty

Data Augmentation

  • Image: rotation, flipping, cropping, color jittering, elastic deformations
  • Text: back-translation, paraphrasing, synonym replacement
  • General: noise injection, mixup, cutmix
  • Creates additional training samples from existing data

Batch Normalization

  • Normalizes layer inputs
  • Reduces internal covariate shift
  • Allows higher learning rates
  • Has regularization effect
  • Variants: layer norm, group norm, instance norm

Weight Decay

  • Similar to L2 regularization
  • Decoupled weight decay (AdamW)
  • Reduces magnitude of weights over time

Noise Injection

  • Gaussian noise to inputs
  • Label smoothing (soft targets instead of hard 0/1)
  • Stochastic depth

Ensemble Methods as Regularization

  • Averaging multiple models reduces overfitting
  • Bagging with random subsets
  • Boosting with sequential weak learners

Model Complexity Constraints

  • Tree depth limits
  • Number of basis functions
  • Number of layers/units in networks
  • Sparsity constraints

Architectural Regularization

  • Skip connections
  • Bottleneck layers
  • Attention mechanisms
  • Structured pruning

9. Training Procedure

The systematic process of iteratively updating model parameters to minimize loss.

Key Concepts and Techniques:

Training Loop Components

  • Forward pass: compute predictions
  • Loss computation: calculate loss value
  • Backward pass: compute gradients
  • Parameter update: modify weights/biases
  • Validation evaluation: monitor progress

Epoch and Batch Management

  • Epoch: full pass through entire training dataset
  • Batch/Mini-batch: subset of data for one update
  • Batch size selection: trade-off between speed and stability
  • Number of epochs: determined by early stopping or fixed
  • Iteration: single update step

Initialization Strategies

  • Random initialization
  • Xavier/Glorot initialization (maintains variance across layers)
  • He initialization (for ReLU networks)
  • Pre-trained weights (transfer learning)
  • Layer-wise initialization
  • Orthogonal initialization

Training Techniques

  • Curriculum learning: gradually increase difficulty
  • Multi-task learning: simultaneous training on related tasks
  • Meta-learning: learning to learn
  • Online learning: streaming data updates
  • Continual learning: learning sequentially without forgetting

Validation During Training

  • Validation set evaluation at regular intervals
  • Early stopping based on validation metrics
  • Learning curves: plotting train/validation loss over epochs
  • Overfitting detection
  • Hyperparameter adjustment mid-training

Convergence Criteria

  • Loss below threshold
  • Gradient magnitude below threshold
  • No improvement for N iterations
  • Maximum iterations reached
  • Validation metric plateau

Debugging Training Issues

  • NaN/Inf loss detection and handling
  • Gradient clipping to prevent explosions
  • Layer-wise learning rate adaptation
  • Mixed precision training
  • Gradient accumulation (for large batches)

Distributed Training

  • Data parallel training (replicate model, split data)
  • Model parallel training (split model across devices)
  • Synchronized vs asynchronous updates
  • Gradient averaging across devices
  • Communication patterns

Training Time Optimization

  • GPU/TPU acceleration
  • Mixed precision (float16 and float32)
  • Model checkpointing (gradient checkpointing)
  • Knowledge distillation
  • Quantization-aware training

Training Monitoring

  • Loss trajectories
  • Learning rate schedules
  • Weight/gradient statistics
  • Activation histograms
  • Training time estimation

10. Evaluation

Assessing model performance on held-out test data and validation metrics.

Key Concepts and Techniques:

Regression Metrics

  • Mean Squared Error (MSE): Penalizes larger errors quadratically
  • Root Mean Squared Error (RMSE): Interpretable in original scale
  • Mean Absolute Error (MAE): Robust to outliers
  • Mean Absolute Percentage Error (MAPE): Percentage-based error
  • R² Score: Proportion of variance explained (0-1 for good fit)
  • Explained Variance Score: Similar to R²
  • Median Absolute Error: Robust to outliers
  • Spearman/Pearson Correlation: Measures monotonic/linear relationships

Binary Classification Metrics

  • Accuracy: Proportion of correct predictions
    • ( \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} )
  • Precision: Of predicted positives, how many are correct
    • ( \text{Precision} = \frac{TP}{TP + FP} )
  • Recall/Sensitivity: Of actual positives, how many are detected
    • ( \text{Recall} = \frac{TP}{TP + FN} )
  • F1-Score: Harmonic mean of precision and recall
    • ( F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} )
  • Specificity: Of actual negatives, how many are correctly rejected
    • ( \text{Specificity} = \frac{TN}{TN + FP} )
  • ROC Curve & AUC: Trade-off between TPR and FPR across thresholds
  • PR Curve & AP: Precision-recall relationship (better for imbalanced data)
  • Log Loss: Cross-entropy for probability predictions
  • Brier Score: Mean squared difference between predicted probabilities and outcomes

Multi-class Classification Metrics

  • Macro-averaging: Calculate metric for each class, average equally
  • Micro-averaging: Calculate metric globally by counting TP/FP/TN/FN
  • Weighted-averaging: Account for class imbalance in averaging
  • Confusion Matrix: Row-by-row breakdown of predictions vs actual
  • Per-class metrics: Precision, recall, F1 for each class

Clustering Metrics

  • Silhouette Score: Measures how similar points are to own cluster vs others
  • Davies-Bouldin Index: Average similarity ratio of clusters
  • Calinski-Harabasz Index: Ratio of between-cluster to within-cluster variance
  • Homogeneity, Completeness, V-measure: For labeled clustering evaluation
  • Adjusted Rand Index (ARI): Similarity of clustering to ground truth
  • Normalized Mutual Information (NMI): Information-theoretic cluster quality

Ranking Metrics

  • Mean Reciprocal Rank (MRR): Position of first relevant item
  • Mean Average Precision (MAP): Average precision at each rank
  • Normalized Discounted Cumulative Gain (NDCG): Relevance-weighted ranking quality
  • Click-Through Rate (CTR): For online ranking systems

Anomaly Detection Metrics

  • Precision, Recall, F1 (same as classification)
  • ROC-AUC for outlier scores
  • Contamination rate: Expected proportion of anomalies

Calibration Metrics

  • Calibration curves: Predicted probability vs observed frequency
  • Brier score: Measures calibration
  • Expected Calibration Error (ECE): Average calibration deviation
  • Platt scaling: Post-hoc calibration

Domain-Specific Metrics

  • BLEU, ROUGE (for NLP/translation)
  • Mean Average Precision (for information retrieval)
  • Diversity metrics (for recommendation systems)
  • Fairness metrics (group parity, individual fairness)

Cross-Validation Strategies

  • K-Fold Cross-Validation: Partition data into k folds, train k times
  • Stratified K-Fold: Maintains class distribution in each fold
  • Time Series Cross-Validation: Forward-chaining to respect temporal order
  • Leave-One-Out Cross-Validation (LOOCV): Each sample is test set once
  • Repeated K-Fold: Run K-fold multiple times with different splits

Evaluation Best Practices

  • Test data must be completely held out
  • No information leakage from test set to training
  • Multiple metrics to capture different aspects
  • Consider business impact and costs of different error types
  • Document evaluation protocol
  • Report confidence intervals or standard deviations
  • Compare against baselines and state-of-the-art

Error Analysis

  • Examine misclassified examples
  • Identify common error patterns
  • Slice performance by data subgroups
  • False positive and false negative analysis
  • Confusion matrix interpretation

11. Inference

Deploying the trained model to make predictions on new, unseen data.

Key Concepts and Techniques:

Inference Modes

  • Batch Inference: Process multiple samples at once
    • Efficient for large-scale processing
    • Typical in offline analytics
  • Real-time/Online Inference: Single or small number of predictions
    • Low-latency requirements
    • Typical in production APIs
  • Edge Inference: Running models on edge devices
    • Mobile phones, IoT devices
    • Model compression required
  • Stream Inference: Continuous processing of streaming data

Model Deployment Formats

  • Model Serialization: Save trained model to disk
    • Pickle (Python)
    • SavedModel (TensorFlow)
    • ONNX (interoperable format)
    • H5 (Keras)
    • JobLib (scikit-learn)
  • Containerization: Docker for reproducible deployment
  • Versioning: Track model versions and rollbacks
  • Configuration files: Store hyperparameters and preprocessing steps

Preprocessing at Inference

  • Apply same transformations used during training
  • Feature normalization with training statistics (mean, std)
  • Feature selection (use same features as training)
  • Encoding of categorical variables
  • Handling missing values with same strategy

Prediction Output Handling

  • Point predictions
  • Confidence scores/probabilities
  • Uncertainty quantification
  • Prediction explanations

Inference Optimization

  • Model Compression
    • Quantization: Use lower precision (int8 instead of float32)
    • Pruning: Remove less important weights
    • Distillation: Train smaller model from larger one
    • Architecture search for efficient models
  • Batching: Group predictions for efficiency
  • Caching: Store results for repeated queries
  • Hardware Acceleration: GPU/TPU deployment
  • Model Serving Frameworks
    • TensorFlow Serving
    • TorchServe
    • Seldon Core
    • BentoML
    • KServe

Performance Monitoring in Inference

  • Latency: Response time per prediction
  • Throughput: Predictions per second
  • Resource Usage: CPU, memory, GPU utilization
  • Model Performance Drift: Monitor metrics over time
  • Data Distribution Shift: Detect when input distribution changes
  • Concept Drift: Detect when target variable behavior changes

Handling Prediction Uncertainty

  • Confidence intervals
  • Bayesian predictions with uncertainty
  • Monte Carlo Dropout for uncertainty estimation
  • Ensemble-based uncertainty
  • Calibrated probabilities

Production Considerations

  • Error handling and fallbacks
  • Monitoring and alerting
  • A/B testing new models
  • Gradual rollout (canary deployments)
  • Model explainability and debugging
  • Compliance with regulations (GDPR, fairness)
  • Logging predictions for analysis
  • Feedback loops for retraining

Serving Patterns

  • Single model serving
  • Multi-model serving
  • Ensemble serving (multiple model outputs combined)
  • Cascade/fallback models
  • Shadow models (running new model alongside current)

Scalability

  • Horizontal scaling: Multiple servers
  • Vertical scaling: More powerful hardware
  • Load balancing
  • Auto-scaling based on demand

Key Relationships Between Components

  1. Loss Function & Optimization: Loss guides optimization; different losses suit different optimizers
  2. Regularization & Overfitting: Directly combats overfitting that evaluation detects
  3. Features & Model Family: Different models benefit from different feature representations
  4. Hyperparameters & Training: Hyperparameters drastically affect training dynamics
  5. Evaluation & Inference: Evaluation metrics guide inference performance expectations
  6. Data Quality & All Stages: Poor data quality impacts every downstream stage

Common Pitfalls to Avoid

  • Data Leakage: Test data information flowing into training
  • Not Scaling Features: Different scales can affect distance-based algorithms
  • Using Same Data for Validation and Testing: Leads to optimistic evaluation
  • Ignoring Class Imbalance: Accuracy can be misleading with imbalanced data
  • Not Monitoring Inference: Performance can degrade silently in production
  • Overfitting to Validation Set: Excessive hyperparameter tuning on validation data
  • Not Reproducing Preprocessing in Inference: Train-test mismatch causes issues
  • Ignoring Model Interpretability: Black-box models are risky in sensitive domains
  • Not Considering Computational Constraints: Beautiful models that are too slow to serve
  • Forgetting Baseline Comparisons: Always compare against simple baselines

Tools & Frameworks by Stage

Data Handling: Pandas, Polars, Apache Spark, Dask Feature Engineering: Scikit-learn, Featuretools, tsfresh Model Development: Scikit-learn, TensorFlow, PyTorch, XGBoost, LightGBM Hyperparameter Tuning: Optuna, Ray Tune, Hyperopt, Scikit-optimize Evaluation & Monitoring: Scikit-learn, TensorBoard, Wandb, Neptune Explainability: SHAP, LIME, Captum, Integrated Gradients Model Serving: TensorFlow Serving, TorchServe, BentoML, Ray Serve MLOps: MLflow, DVC, Kubeflow, Airflow, Prefect


Resources