Workflow Element Store

  1. Surveys and Questionnaires
  2. Flat files
  3. Public Datasets
  4. APIs and Data Feeds
  5. Data bases - NoSQL
  6. Data Bases - SQL
  7. Experiments (DoE)
  8. Mobile Applications or IoT Applications
  9. Data Collaboration and Partnerships
  10. WebScraping
  11. Feedback Data
  1. MS SQL server
  2. AWS Redshift
  3. PostgreSQL
  4. Oracle DB
  5. ETL/ELT pipeline
  6. MySQL
  7. s3
  8. RDBMS
  9. AWS RDS
  10. GCP Dataflow
  11. MongoDB
  12. AWS Kinesis
  13. GCS
  14. AWS Glue
  15. Azure ADF
  16. Azure Streaming Analytics
  17. Azure Synapse
  18. Azure blob storage
  19. GCP BigQuery
  20. Apache Kafka
  21. GCP Data Fusion
  1. Feature Selection
  2. Handling Missing Data
  3. Handling Time-Series Data
  4. Data Transformations
  5. Polynomial Features
  6. Handling Noisy Data
  7. Handling Categorical Data
  8. Auto-Preprocessing libraries
  9. Interaction Features
  10. AutoEDA libraries
  11. Augmentation
  12. Data Scaling and Normalization
  13. Time-Based Features
  14. Textual Feature Extraction
  15. Handling Imbalanced Classes
  16. Annotation
  17. Dimensionality Reduction
  18. Domain-Specific Feature Engineering
  19. Data Partitioning - Train, Validation, & Test
  20. Dealing with Outliers
  21. Feature Extraction from Images
  22. Binning / Discretization
  1. Regularization
  2. Weight Initialization
  3. Ensemble Techniques
  4. Network Analytics/ GeoSpatial Analytics
  5. Batch Size Selection
  6. Regression Analysis
  7. Natural Language Processing
  8. Transfer Learning
  9. Reinforcement Learning
  10. Regular Monitoring and Logging
  11. External Validation
  12. Evaluation Metrics
  13. Word Embeddings
  14. Early Stopping
  15. Model Interpretability
  16. Data Augmentation
  17. Hyperparameter Tuning
  18. Multiclass Classification Techniques
  19. Cross-Validation
  20. Model Comparison
  21. Binary Classification Techniques
  22. Blackbox - Neural Network Models
  23. Batch Normalization
  24. AutoML
  25. Regularization Techniques
  26. Performance Visualization
  27. Learning Rate Scheduling
  28. Recommendation Engine
  29. Transfer Learning
  30. Cross-Validation
  31. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  32. Association Rules
  33. Forecasting Techniques
  34. Clustering
  1. model registry
  2. Databases
  3. Datawarehouse
  4. Data Preprocessing pipeline models
  5. code repository
  1. Data Drift Monitoring
  2. Prediction Logging
  3. Concept Drift Detection
  4. Containerization
  5. Model Serialization
  6. Bias and Fairness Assessment
  7. Streamlit
  8. Edge Deployment
  9. FastAPI
  10. Model Drift
  11. Feedback Collection
  12. Performance Metrics
  13. Cloud Deployment
  14. Model Health Monitoring
  15. Alerting and Notification
  16. Serverless Computing
  17. Model Versioning
  18. Flask
ML Workflow - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Inference Pipeline

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference