Workflow Element Store

  1. WebScraping
  2. Data Bases - SQL
  3. Flat files
  4. APIs and Data Feeds
  5. Data Collaboration and Partnerships
  6. Data bases - NoSQL
  7. Public Datasets
  8. Feedback Data
  9. Experiments (DoE)
  10. Surveys and Questionnaires
  11. Mobile Applications or IoT Applications
  1. GCS
  2. Oracle DB
  3. AWS Kinesis
  4. GCP Data Fusion
  5. s3
  6. Azure Synapse
  7. AWS Redshift
  8. Azure ADF
  9. GCP Dataflow
  10. Azure blob storage
  11. Azure Streaming Analytics
  12. AWS RDS
  13. ETL/ELT pipeline
  14. AWS Glue
  15. GCP BigQuery
  16. MS SQL server
  17. PostgreSQL
  18. Apache Kafka
  19. MongoDB
  20. RDBMS
  21. MySQL
  1. Binning / Discretization
  2. Handling Categorical Data
  3. Augmentation
  4. Dealing with Outliers
  5. Data Transformations
  6. Polynomial Features
  7. Handling Noisy Data
  8. Handling Time-Series Data
  9. Feature Extraction from Images
  10. Handling Missing Data
  11. Domain-Specific Feature Engineering
  12. Handling Imbalanced Classes
  13. Data Scaling and Normalization
  14. AutoEDA libraries
  15. Auto-Preprocessing libraries
  16. Dimensionality Reduction
  17. Data Partitioning - Train, Validation, & Test
  18. Feature Selection
  19. Interaction Features
  20. Annotation
  21. Time-Based Features
  22. Textual Feature Extraction
  1. Weight Initialization
  2. Regular Monitoring and Logging
  3. Ensemble Techniques
  4. Regression Analysis
  5. Hyperparameter Tuning
  6. Network Analytics/ GeoSpatial Analytics
  7. Transfer Learning
  8. Clustering
  9. Early Stopping
  10. Association Rules
  11. Batch Normalization
  12. Natural Language Processing
  13. Reinforcement Learning
  14. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  15. Learning Rate Scheduling
  16. Evaluation Metrics
  17. Performance Visualization
  18. Cross-Validation
  19. Recommendation Engine
  20. Regularization
  21. Model Comparison
  22. Blackbox - Neural Network Models
  23. Multiclass Classification Techniques
  24. Transfer Learning
  25. Word Embeddings
  26. Forecasting Techniques
  27. AutoML
  28. External Validation
  29. Batch Size Selection
  30. Data Augmentation
  31. Cross-Validation
  32. Model Interpretability
  33. Binary Classification Techniques
  34. Regularization Techniques
  1. Databases
  2. model registry
  3. code repository
  4. Datawarehouse
  5. Data Preprocessing pipeline models
  1. Concept Drift Detection
  2. Flask
  3. Alerting and Notification
  4. Model Serialization
  5. Model Health Monitoring
  6. Containerization
  7. Serverless Computing
  8. Cloud Deployment
  9. Prediction Logging
  10. FastAPI
  11. Performance Metrics
  12. Edge Deployment
  13. Bias and Fairness Assessment
  14. Model Drift
  15. Streamlit
  16. Feedback Collection
  17. Model Versioning
  18. Data Drift Monitoring
ML Workflow - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Inference Pipeline

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference