Models and Configuration Guide
This section provides comprehensive examples of machine learning models available in ObServML, organized by experiment type. Each model includes practical configuration examples from real use cases, REST API usage patterns, and parameter explanations.
ObServML supports four main experiment types: - Time Series Analysis: Forecasting and anomaly detection in temporal data - Fault Detection: Unsupervised anomaly detection in sensor data - Fault Isolation: Supervised classification for root cause analysis - Process Mining: Analysis of sequential processes and workflows
Configuration Structure
All models follow a consistent YAML configuration structure:
load_object:
module: framework.{ExperimentType}
name: {ExperimentType}Experiment
setup:
datetime_column: "timestamp" # Column containing timestamps
target: "target_variable" # Target variable (for supervised learning)
# Additional setup parameters...
eda: # Optional: Exploratory Data Analysis
create_model:
model: "model_name"
params:
# Model-specific parameters
Time Series Analysis
Time Series Analysis experiments are designed for monitoring sensor data over time, offering insights into individual sensor behavior and enabling predictive maintenance through forecasting and anomaly detection.
Prophet
Prophet is excellent for time series forecasting with strong seasonal patterns and holiday effects.
Configuration Example:
load_object:
module: framework.TimeSeriesAnalysis
name: TimeSeriesAnomalyExperiment
setup:
datetime_column: "ds"
datetime_format: "ms"
target: "y"
predict_window: 1000
retrain:
retrain_window: 5000
metric: "MAPE"
metric_threshold: 0.3
higher_better: false
eda:
create_model:
model: "prophet"
params:
periods: 0
factor: 1.0
forecast_window: 100
REST API Usage:
# Train Prophet model
curl -X POST "http://localhost:8010/timeseries_prophet/train" \
-H "Content-Type: application/json" \
-d '{
"load_object": {
"module": "framework.TimeSeriesAnalysis",
"name": "TimeSeriesAnomalyExperiment"
},
"setup": {
"datetime_column": "ds",
"target": "y",
"predict_window": 1000
},
"create_model": {
"model": "prophet",
"params": {
"periods": 0,
"factor": 1.0,
"forecast_window": 100
}
}
}'
# Make predictions
curl -X POST "http://localhost:8010/timeseries_prophet/predict"
# Get forecast plot
curl "http://localhost:8010/timeseries_prophet/plot/forecast"
Parameters:
- periods: Number of periods to forecast forward
- factor: Anomaly detection factor
- forecast_window: Number of future points to predict
ARIMA/SARIMA
ARIMA models are ideal for stationary time series with clear autoregressive patterns.
Configuration Example:
load_object:
module: framework.TimeSeriesAnalysis
name: TimeSeriesAnomalyExperiment
setup:
datetime_column: "ds"
target: "y"
eda:
create_model:
model: "arima"
params:
start_p: 10
d: 1
start_q: 10
max_p: 100
max_q: 100
seasonal: false
threshold_for_anomaly: 3
Parameters:
- start_p: Starting value for autoregressive order
- d: Differencing order
- start_q: Starting value for moving average order
- max_p/max_q: Maximum values for parameter search
- seasonal: Enable seasonal ARIMA (SARIMA)
- threshold_for_anomaly: Standard deviations for anomaly detection
LSTM
Long Short-Term Memory networks for complex temporal patterns and non-linear relationships.
Configuration Example:
load_object:
module: framework.TimeSeriesAnalysis
name: TimeSeriesAnomalyExperiment
setup:
datetime_column: "ds"
target: "y"
eda:
create_model:
model: "lstm"
params:
seq_length: 50
layer_no: 2
cell_no: 64
epoch_no: 100
batch_size: 32
shuffle: true
patience: 10
threshold_for_anomaly: 3
Parameters:
- seq_length: Length of input sequences
- layer_no: Number of LSTM layers
- cell_no: Number of cells per LSTM layer
- epoch_no: Training epochs
- batch_size: Training batch size
- patience: Early stopping patience
- threshold_for_anomaly: Anomaly detection threshold
Autoencoder
Neural network autoencoders for anomaly detection in time series through reconstruction error.
Configuration Example:
load_object:
module: framework.TimeSeriesAnalysis
name: TimeSeriesAnomalyExperiment
setup:
datetime_column: "ds"
target: "y"
eda:
create_model:
model: "ae"
params:
layer_no: 6
window: 250
epoch_no: 50
batch_size: 64
shuffle: false
threshold_for_anomaly: 3
neuron_no_enc: [30, 25, 20, 15, 10, 5]
neuron_no_dec: [5, 10, 15, 20, 25, 30]
act_enc: 'relu'
act_dec: 'relu'
Parameters:
- layer_no: Number of layers in encoder/decoder
- window: Moving window size for sequences
- neuron_no_enc/dec: Neurons per layer in encoder/decoder
- act_enc/dec: Activation functions
SSA (Singular Spectrum Analysis)
SSA for time series decomposition and anomaly detection.
Configuration Example:
load_object:
module: framework.TimeSeriesAnalysis
name: TimeSeriesAnomalyExperiment
setup:
datetime_column: "ds"
target: "y"
eda:
create_model:
model: "ssa"
params:
window_size: 10
lower_frequency_bound: 0.05
lower_frequency_contribution: 0.975
threshold: 3
Fault Detection
Fault Detection experiments focus on unsupervised anomaly detection in multivariate sensor data, identifying deviations from normal operating conditions without requiring labeled data.
Isolation Forest
Isolation Forest is highly effective for anomaly detection in high-dimensional data by isolating anomalies through random partitioning.
Configuration Example:
load_object:
module: framework.FaultDetection
name: FaultDetectionExperiment
setup:
datetime_column: "ds"
datetime_format: "ms"
eda:
create_model:
model: "iforest"
params:
n_estimators: 100
contamination: "auto"
random_state: 0
REST API Usage:
# Train Isolation Forest model
curl -X POST "http://localhost:8010/pump_anomaly/train" \
-H "Content-Type: application/json" \
-d '{
"load_object": {
"module": "framework.FaultDetection",
"name": "FaultDetectionExperiment"
},
"setup": {
"datetime_column": "ds"
},
"create_model": {
"model": "iforest",
"params": {
"n_estimators": 100,
"contamination": "auto"
}
}
}'
# Detect anomalies in new data
curl -X POST "http://localhost:8010/pump_anomaly/predict"
# Get anomaly visualization
curl "http://localhost:8010/pump_anomaly/plot/outliers"
Parameters:
- n_estimators: Number of isolation trees
- contamination: Expected proportion of anomalies ("auto" for automatic detection)
- random_state: Random seed for reproducibility
PCA (Principal Component Analysis)
PCA with Hotelling's T² and SPE tests for multivariate anomaly detection, as demonstrated in the research paper with pump sensor data.
Configuration Example:
load_object:
module: framework.FaultDetection
name: FaultDetectionExperiment
setup:
datetime_column: "ds"
datetime_format: "ms"
eda:
create_model:
model: "pca"
params:
alpha: 0.05
detect_outliers: ['ht2', 'spe']
n_components: 0.95
normalize: true
Use Case Example (from research paper): The pump dataset contains 50 sensors without labeled data, making it ideal for fault detection. PCA reduces dimensionality while preserving 95% of variance, then applies statistical tests to identify anomalies.
Parameters:
- alpha: Significance level for Hotelling's T² test
- detect_outliers: Types of outlier detection ['ht2', 'spe']
- n_components: Variance explained by principal components
- normalize: Apply data normalization
DBSCAN
Density-based clustering for anomaly detection, effective for identifying clusters of normal behavior.
Configuration Example:
load_object:
module: framework.FaultDetection
name: FaultDetectionExperiment
setup:
datetime_column: "ds"
datetime_format: "ms"
eda:
create_model:
model: "dbscan"
params:
eps: 2
min_samples: 5
Parameters:
- eps: Maximum distance between points in a neighborhood
- min_samples: Minimum points required to form a cluster
Elliptic Envelope
Robust covariance estimation for outlier detection in multivariate data.
Configuration Example:
load_object:
module: framework.FaultDetection
name: FaultDetectionExperiment
setup:
datetime_column: "ds"
datetime_format: "ms"
eda:
create_model:
model: "ee"
params:
contamination: 0.1
Parameters:
- contamination: Proportion of outliers in the dataset (0.0 to 0.5)
Fault Isolation
Fault Isolation experiments perform supervised classification when labeled data is available, enabling root cause analysis and identification of specific fault types or machine states.
Decision Tree
Decision trees provide interpretable classification with feature importance analysis, as demonstrated in the research paper with electrical circuit data.
Configuration Example:
load_object:
module: framework.FaultIsolation
name: FaultIsolationExperiment
setup:
datetime_column: "ds"
datetime_format: "ms"
target: "Output (S)"
predict_window: 1000
retrain:
retrain_window: 5000
metric: "Accuracy"
metric_threshold: 0.9
higher_better: true
eda:
create_model:
model: "dt"
params:
REST API Usage:
# Train Decision Tree for fault isolation
curl -X POST "http://localhost:8010/electrical_faults/train" \
-H "Content-Type: application/json" \
-d '{
"load_object": {
"module": "framework.FaultIsolation",
"name": "FaultIsolationExperiment"
},
"setup": {
"datetime_column": "ds",
"target": "Output (S)",
"predict_window": 1000
},
"create_model": {
"model": "dt",
"params": {}
}
}'
# Classify new faults
curl -X POST "http://localhost:8010/electrical_faults/predict"
# Get feature importance plot
curl "http://localhost:8010/electrical_faults/plot/feature_importance"
# Get confusion matrix
curl "http://localhost:8010/electrical_faults/plot/confusion_matrix"
Use Case Example (from research paper): Electrical circuit monitoring with 3-phase electricity data (Va, Vb, Vc voltages and Ia, Ib, Ic currents). The decision tree identifies which sensors contribute most to fault detection, with current Ib showing highest feature importance.
Random Forest
Ensemble method combining multiple decision trees for robust classification.
Configuration Example:
load_object:
module: framework.FaultIsolation
name: FaultIsolationExperiment
setup:
datetime_column: "ds"
target: "fault_type"
eda:
create_model:
model: "rf"
params:
n_estimators: 100
max_depth: 10
random_state: 42
Naive Bayes
Probabilistic classifier based on Bayes' theorem, effective for categorical fault classification.
Configuration Example:
load_object:
module: framework.FaultIsolation
name: FaultIsolationExperiment
setup:
datetime_column: "ds"
target: "fault_category"
eda:
create_model:
model: "nb"
params:
Hidden Markov Models (HMM)
HMM for sequential fault pattern recognition and state-based fault isolation.
Configuration Example:
load_object:
module: framework.FaultIsolation
name: FaultIsolationExperiment
setup:
datetime_column: "ds"
target: "machine_state"
eda:
create_model:
model: "hmm"
params:
n_iter: 1000
covariance_type: "diag"
n_mix: 10
Parameters:
- n_iter: Maximum number of EM iterations
- covariance_type: Type of covariance matrix
- n_mix: Number of mixture components
Bayesian Network
Probabilistic graphical model for understanding causal relationships between variables.
Configuration Example:
load_object:
module: framework.FaultIsolation
name: FaultIsolationExperiment
setup:
datetime_column: "ds"
target: "fault_root_cause"
eda:
create_model:
model: "bn"
params:
learningMethod: 'MIIC'
prior: 'Smoothing'
priorWeight: 1
discretizationNbBins: 30
discretizationStrategy: "quantile"
discretizationThreshold: 0.01
usePR: false
Note: Bayesian Network prediction requires target variable and may have Docker compatibility issues with pyAgrum.
Process Mining
Process Mining experiments analyze sequential data to understand workflows, operator behavior, and process optimization opportunities.
Heuristics Miner
Discovers process models from event logs using heuristic rules.
Configuration Example:
load_object:
module: framework.ProcessMining
name: ProcessMiningExperiment
setup:
eda:
create_model:
model: "heuristics"
params:
Apriori Association Rules
Finds frequent patterns and association rules in sequential data.
Configuration Example:
load_object:
module: framework.ProcessMining
name: ProcessMiningExperiment
setup:
eda:
create_model:
model: "apriori"
params:
min_support: 0.1
min_confidence: 0.8
TopK Rules
Discovers the top-K most interesting association rules.
Configuration Example:
load_object:
module: framework.ProcessMining
name: ProcessMiningExperiment
setup:
eda:
create_model:
model: "topk"
params:
k: 100
min_confidence: 0.5
CMSPAM
Closed sequential pattern mining for discovering frequent subsequences.
Configuration Example:
load_object:
module: framework.ProcessMining
name: ProcessMiningExperiment
setup:
eda:
create_model:
model: "cmspam"
params:
min_support: 0.1
Advanced Configuration Options
Automatic Retraining
Configure automatic model retraining based on performance metrics:
setup:
retrain:
retrain_window: 5000 # Use last 5000 samples for retraining
metric: "Accuracy" # Metric to monitor (Accuracy, F1, Precision, Recall, MAPE, MSE)
metric_threshold: 0.9 # Threshold that triggers retraining
higher_better: true # Whether higher metric values are better
Data Formatting
Handle different data formats with custom formatting options:
setup:
format:
name: "pivot" # Formatting mode
id: "tsdata" # Data variable name
max_level: 1
columns: "target"
index: "date"
values: "value"
Prediction Windows
Control visualization and prediction scope:
setup:
predict_window: 1000 # Number of samples to show in prediction plots
forecast_window: 100 # Number of future points to predict
REST API Patterns
Common Endpoints
All experiments support these standard endpoints:
POST /{experiment_name}/train- Train a new modelPOST /{experiment_name}/predict- Make predictionsPOST /{experiment_name}/save- Save model to MLflowPOST /{experiment_name}/load- Load model from MLflowGET /{experiment_name}/plot/{plot_name}- Get visualizationGET /{experiment_name}/cfg- Get configurationGET /{experiment_name}/run_id- Get MLflow run ID
Batch Processing Example
# Train multiple models in sequence
for model in "iforest" "pca" "dbscan"; do
curl -X POST "http://localhost:8010/sensor_${model}/train" \
-H "Content-Type: application/json" \
-d @configs/pump/${model}.yaml
done
# Monitor all models
for model in "iforest" "pca" "dbscan"; do
curl -X POST "http://localhost:8010/sensor_${model}/predict"
done
Model Selection Guidelines
Time Series Analysis
- Prophet: Strong seasonality, holiday effects, missing data tolerance
- ARIMA: Stationary data, clear autoregressive patterns
- LSTM: Complex non-linear patterns, large datasets
- Autoencoder: Anomaly detection, reconstruction-based analysis
- SSA: Trend and seasonal decomposition
Fault Detection
- Isolation Forest: High-dimensional data, unknown anomaly types
- PCA: Multivariate data, statistical anomaly detection
- DBSCAN: Density-based clusters, irregular cluster shapes
- Elliptic Envelope: Gaussian-distributed data, robust outlier detection
Fault Isolation
- Decision Tree: Interpretability required, feature importance analysis
- Random Forest: Robust classification, ensemble benefits
- Naive Bayes: Categorical features, probabilistic classification
- HMM: Sequential patterns, state-based analysis
- Bayesian Network: Causal relationships, probabilistic inference
Process Mining
- Heuristics Miner: Process discovery from event logs
- Apriori: Frequent pattern mining, association rules
- TopK Rules: Most interesting patterns, rule ranking
- CMSPAM: Sequential pattern mining, closed patterns
Troubleshooting
Common Issues
- BayesNet Docker Issues: Works locally but may fail in Docker due to pyAgrum compilation requirements
- PCA Indexing: Avoid duplicate indices in input data
- Process Mining: Some models repeat training during prediction phase
- Memory Requirements: Deep learning models (LSTM, Autoencoder) require sufficient RAM
Performance Optimization
- Use appropriate
predict_windowsizes to balance visualization and performance - Configure
retrain_windowbased on data velocity and model stability - Monitor MLflow for model performance metrics and storage usage
- Use batch processing for multiple model training
This comprehensive guide provides the foundation for implementing robust monitoring solutions with ObServML across various industrial applications and use cases.