Workflow Element Store

  1. Surveys and Questionnaires
  2. WebScraping
  3. APIs and Data Feeds
  4. Data Bases - SQL
  5. Public Datasets
  6. Data bases - NoSQL
  7. Mobile Applications or IoT Applications
  8. Data Collaboration and Partnerships
  9. Feedback Data
  10. Experiments (DoE)
  11. Flat files
  1. RDBMS
  2. Azure ADF
  3. GCP Data Fusion
  4. Azure blob storage
  5. GCP BigQuery
  6. PostgreSQL
  7. Azure Synapse
  8. MySQL
  9. AWS Kinesis
  10. s3
  11. MS SQL server
  12. AWS Glue
  13. ETL/ELT pipeline
  14. Oracle DB
  15. GCS
  16. MongoDB
  17. Apache Kafka
  18. Azure Streaming Analytics
  19. GCP Dataflow
  20. AWS RDS
  21. AWS Redshift
  1. Dimensionality Reduction
  2. Data Partitioning - Train, Validation, & Test
  3. AutoEDA libraries
  4. Binning / Discretization
  5. Feature Extraction from Images
  6. Data Scaling and Normalization
  7. Handling Imbalanced Classes
  8. Handling Missing Data
  9. Domain-Specific Feature Engineering
  10. Polynomial Features
  11. Handling Noisy Data
  12. Annotation
  13. Handling Time-Series Data
  14. Textual Feature Extraction
  15. Feature Selection
  16. Dealing with Outliers
  17. Data Transformations
  18. Auto-Preprocessing libraries
  19. Augmentation
  20. Interaction Features
  21. Time-Based Features
  22. Handling Categorical Data
  1. Association Rules
  2. External Validation
  3. Evaluation Metrics
  4. Network Analytics/ GeoSpatial Analytics
  5. Cross-Validation
  6. Regularization
  7. AutoML
  8. Forecasting Techniques
  9. Model Comparison
  10. Batch Size Selection
  11. Word Embeddings
  12. Natural Language Processing
  13. Early Stopping
  14. Regular Monitoring and Logging
  15. Hyperparameter Tuning
  16. Data Augmentation
  17. Learning Rate Scheduling
  18. Recommendation Engine
  19. Weight Initialization
  20. Ensemble Techniques
  21. Blackbox - Neural Network Models
  22. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  23. Clustering
  24. Cross-Validation
  25. Performance Visualization
  26. Regularization Techniques
  27. Binary Classification Techniques
  28. Batch Normalization
  29. Reinforcement Learning
  30. Transfer Learning
  31. Multiclass Classification Techniques
  32. Transfer Learning
  33. Regression Analysis
  34. Model Interpretability
  1. Databases
  2. code repository
  3. model registry
  4. Datawarehouse
  5. Data Preprocessing pipeline models
  1. Data Drift Monitoring
  2. Performance Metrics
  3. Feedback Collection
  4. Model Versioning
  5. FastAPI
  6. Serverless Computing
  7. Prediction Logging
  8. Concept Drift Detection
  9. Model Health Monitoring
  10. Flask
  11. Edge Deployment
  12. Alerting and Notification
  13. Streamlit
  14. Cloud Deployment
  15. Model Drift
  16. Model Serialization
  17. Bias and Fairness Assessment
  18. Containerization
ML Workflow - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Inference Pipeline

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference