Experiments
Experiments are the basic building blocks of the ExperimentHub. It is a Protocol class and can be used freely to derive an Experiment, such as the ones below. For specific goals and roles, we recommend buildign custom experiments, which require 2 important things: the configuration files, and the models
Bases: Protocol
A class representing an experiment. This class is a protocol that defines the methods that an experiment should implement. Some functions must be overloaded by an implementation class, while others can be left as is. Can only handle sklearn.BaseEstimator class, and so the used model must be overloaded
| Parameters: |
|
|---|
TODO: - proper logging - Add mmlw_estimator as a base class for the model - SPC chart - add support for data drift (eda) - add support for model drift (eda) - add support for model explainability (shap, lime, etc.) - add support for general data preprocessing (e.g. missing values, outliers, etc.) - add support for dim reduction (PCA, TSNE, etc.) - add support for feature selection (RFE, etc.) - dynamic model loading from specific folder/registry
Source code in framework\Experiment.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 | |
__init__(cfg, experiment_id, run_id)
Initialize the Experiment class. This class is a protocol that defines the methods that an experiment should implement.
| Parameters: |
|
|---|
Source code in framework\Experiment.py
75 76 77 78 79 80 81 82 83 84 85 86 | |
create_model(*args, **kwargs)
This function trains the model.
Source code in framework\Experiment.py
122 123 124 | |
eda()
This function performs simple exploratory data analysis.
| Returns: |
|
|---|
Source code in framework\Experiment.py
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
export()
This function exports the reports as HTML files and logs them to MLflow.
Source code in framework\Experiment.py
332 333 334 335 336 337 338 339 340 341 | |
format_data(data, format=None)
This function will provide a function to recover data from nonstandard json as pd.Dataframe. Parameters: data : data in json format or DataFrame format (dict) : settings for formatting. If None, then default pd.read_json is used.
Source code in framework\Experiment.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
get_eda_reports()
This function returns the IDs of the EDA reports that are available in the model.
| Returns: |
|
|---|
Source code in framework\Experiment.py
343 344 345 346 347 348 349 | |
get_fig_types()
This function returns the IDs of figures that are available in the model.
| Returns: |
|
|---|
Source code in framework\Experiment.py
351 352 353 354 355 356 357 | |
join_data()
Joins new data and previous train data.
Source code in framework\Experiment.py
323 324 325 326 327 328 329 330 | |
load(run_id)
This function loads the experiment. It loads the model and the reports. Uses joblib to load the model and pickle to load the reports.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Source code in framework\Experiment.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | |
plot_model(plot)
This function plots the model.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Source code in framework\Experiment.py
134 135 136 137 138 139 140 141 142 143 144 145 | |
predict(data, *args, **kwargs)
This function predicts the target variable.
Source code in framework\Experiment.py
126 127 128 | |
run(data)
This function runs the experiment. It trains the model and performs exploratory data analysis. Handles everythin internally...
| Parameters: |
|
|---|
Source code in framework\Experiment.py
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 | |
save()
This function saves the experiment. It saves the model and the reports. Uses joblib to save the model and pickle to save the reports.
Source code in framework\Experiment.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
setup(data, *args, **kwargs)
This function sets up the experiment. It is called before training the model.
Source code in framework\Experiment.py
118 119 120 | |
spc()
This function creates a statistical process control chart. WIP.
Source code in framework\Experiment.py
285 286 287 | |
FaultIsolationExperiment
Bases: Experiment
This experiment file is used for fault isolation experiments. It is a subclass of the Experiment class in the framework.Experiment module. It is used to create, train, and predict using fault isolation models. Heavily relies on classification algorithms.
Implemented models:
-
Decision Tree : 'models.fault_isolation.DecisionTree'
-
Random Forest : 'models.fault_isolation.RandomForest'
-
Naive Bayes : 'models.fault_isolation.NaiveBayes'
-
HMM : 'models.fault_isolation.HMM'
-
Markov Chain : 'models.fault_isolation.MarkovChain'
Inherits from Experiment Protocol class from framework.Experiment.
If the experiment is overloaded, and new functions are added, one can call it in the system by adding the function name to the cfg file with relevant parameters. For example, if a new function 'new_function' is added to the system, the cfg file should have the following structure: cfg file should have the following structure:
load_object:
module: framework.FaultIsolation
name: FaultIsolationExperiment
setup:
datetime_column: str The column that contains the datetime information.
target: str The target column for the experiment. (output column)
...
eda:
create_model:
model: str The model to be used for training.
params: dict The parameters to be used for the model.
new_function:
param1: str The first parameter for the function.
param2: str The second parameter for the function.
...
Source code in framework\FaultIsolation.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
__init__(cfg, experiment_id, run_id, *args, **kwargs)
Initialize the model registry. Currently cannot be changed after initialization. In future versions, we will allow for dynamic model loading through configuration files.
Parameters:
cfg (dict): The configuration file for the experiment.
experiment_id (str): The experiment id for the experiment.
run_id (str): The run id for the experiment.
Source code in framework\FaultIsolation.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | |
create_model(*args, **kwargs)
Create the model using the configuration file. The model is trained on the data set up in the 'setup' function.
| Returns: |
|
|---|
Source code in framework\FaultIsolation.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
predict(data)
Predict using the trained model. The model should be trained before calling this function.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Source code in framework\FaultIsolation.py
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | |
setup(data)
Setup the data for training and prediction. This function is called before training the model. Parameters:
data (str): The data to be used for training and prediction in json format.
| Returns: |
|
|---|
Source code in framework\FaultIsolation.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
FaultDetectionExperiment
Bases: Experiment
Source code in framework\FaultDetection.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
__init__(cfg, experiment_id, run_id, *args, **kwargs)
Initialize the model registry. Currently cannot be changed after initialization. In future versions, we will allow for dynamic model loading through configuration files. This experiment file is used for fault detection experiments. It is a subclass of the Experiment class in the framework.Experiment module. It is used to create, train, and predict using fault detection models. Heavily relies on outlier detection/clustering algorithms.
| Parameters: |
|
|---|
Implemented models: - DBSCAN : 'models.fault_detection.DBSCANAnomalyDetection' - Elliptic Envelope 'models.fault_detection.EllipticEnvelopeAnomalyDetection' - Isolation Forest 'models.fault_detection.IsolationForestAnomaly' - PCA 'models.fault_detection.PCAAnomalyDetection'
Inherits from Experiment Protocol class from framework.Experiment. If the experiment is overloaded, and new functions are added, one can call it in the system by adding the function name to the cfg file with relevant parameters. For example, if a new function 'new_function' is added to the system, the cfg file should have the following structure: cfg file should have the following structure:
load_object:
module: framework.FaultIsolation
name: FaultIsolationExperiment
setup:
datetime_column: str The column that contains the datetime information.
target: str The target column for the experiment. (output column)
...
eda:
create_model:
model: str The model to be used for training.
params: dict The parameters to be used for the model.
new_function:
param1: str The first parameter for the function.
param2: str The second parameter for the function.
The function will then be called in the system by calling the 'run' function. experiment.run(data, experiment_id, run_id)
Source code in framework\FaultDetection.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
create_model(*args, **kwargs)
Create the model using the configuration file. The model is trained on the data set up in the 'setup' function. Returns: any The trained model.
Source code in framework\FaultDetection.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
predict(data)
Predict using the trained model. The model should be trained before calling this function.
Parameters:
data (str): The data to be used for prediction.
| Returns: |
|
|---|
Source code in framework\FaultDetection.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
setup(data)
Setup the data for training and prediction. This function is called before any other function to set self.data that can be used in any other function. Also returns the data if need be.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Source code in framework\FaultDetection.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | |
TimeSeriesAnomalyExperiment
Bases: Experiment
Class for interacting with time series models. Plays a similar role to pycaret experiment. Deals with model training, predicting, metrics and serialization. Inherits from Experiment Protocol class from framework.Experiment. Implemented models: - Autoencoder : 'models.time_series_analysis.Autoencoder' - LSTM : 'models.time_series_analysis.LSTM' - Prophet : 'models.time_series_analysis.Prophet' - SSA : 'models.time_series_analysis.SSA' - ARIMA : 'models.time_series_analysis.ARIMA' - Exponential Smoothing : 'models.time_series_analysis.ExponentialSmoothing'
If the experiment is overloaded, and new functions are added, one can call it in the system by adding the function name to the cfg file with relevant parameters. For example, if a new function 'new_function' is added to the system, the cfg file should have the following structure: cfg file should have the following structure:
load_object:
module: framework.TimeSeriesAnalysis
name: TimeSeriesAnomalyExperiment
setup:
ds : str The column that contains the datetime information.
y : str The target column for the experiment. (output column)
...
create_model:
model: str The model to be used for training.
params: dict The parameters to be used for the model.
...
new_function: # e.g. posterior reconciliation of results.
param1: str The first parameter for the function.
param2: str The second parameter for the function.
Source code in framework\TimeSeriesAnalysis.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 | |
__init__(cfg, experiment_id, run_id)
Initialize the model registry. Currently cannot be changed after initialization. In future versions, we will allow for dynamic model loading through configuration files. This experiment function is used for loading basic time series anomaly detection models. Parameters: cfg: dict The configuration file for the experiment. experiment_id: str The experiment id for the experiment. run_id: str The run id for the experiment.
Source code in framework\TimeSeriesAnalysis.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
create_model(*args, **kwargs)
Create the model using the configuration file. The model is trained on the data set up in the 'setup' function. Returns: any The trained model.
Description:
- The function creates the model using the configuration file.
- The function logs the model and parameters to mlflow.
- The function logs the metrics to mlflow.
- The function saves the model to disk.
- The function returns the trained model.
Source code in framework\TimeSeriesAnalysis.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
predict(data)
Predict using the trained model. The model should be trained before calling this function. Parameters: data str The data to be used for prediction.
| Returns: |
|
|---|
Description: - The function predicts using the trained model. - The function updates the predict figures with the new data. - The function calculates the metrics for the model. - The function logs the metrics to mlflow. - The function saves the reports to disk. - The function logs the reports to mlflow. - The function returns the predictions and metrics.
Source code in framework\TimeSeriesAnalysis.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 | |
setup(data, *args, **kwargs)
Setup the data for training and prediction. This function is called before training the model. In the future, it will also be used to preprocess the data and prepare it for training. Parameters: data: pd.DataFrame The data to be used for training and prediction.
| Returns: |
|
|---|
Description:
- The setup function is used to prepare the data for training and prediction. It is called before the model is trained.
- The function renames the columns to 'ds' and 'y' for consistency.
- The function logs the target and datestamp columns to mlflow.
- The function returns the data after processing.
Source code in framework\TimeSeriesAnalysis.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
ProcessMiningExperiment
Bases: Experiment
This experiment file is used for process mining experiments. It is a subclass of the Experiment class in the framework.Experiment module. It is used to create, train, and predict using process mining models. Heavily relies on sequence mining algorithms.
Implemented models: - Apriori : 'models.spmf.Apriori' - CMSPAM : 'models.spmf.CM_SPAM' - TopKRules :'models.spmf.TopKRules' - Heuristics Miner : 'models.spmf.HeuristicsMiner' Inherits from Experiment Protocol class from framework.Experiment. If the experiment is overloaded, and new functions are added, one can call it in the system by adding the function name to the cfg file with relevant parameters. For example, if a new function 'new_function' is added to the system, the cfg file should have the following structure: cfg file should have the following structure:
load_object:
module: framework.ProcessMining
name: ProcessMiningExperiment
setup:
...
create_model:
model: str The model to be used for training.
params: dict The parameters to be used for the model.
new_function:
param1: str The first parameter for the function.
param2: str The second parameter for the function.
...
Source code in framework\ProcessMining.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
__init__(cfg, experiment_id, run_id, *args, **kwargs)
Initialize the model registry. Currently cannot be changed after initialization. Parameters: cfg: dict The configuration file for the experiment. experiment_id: str The experiment id for the experiment. run_id: str The run id for the experiment.
Source code in framework\ProcessMining.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
create_model(*args, **kwargs)
Create the model using the configuration file. The model is trained on the data set up in the 'setup' function. Returns: any The trained model.
Source code in framework\ProcessMining.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
predict(data)
Predict using the trained model. The model should be trained before calling this function. Parameters: data: pd.DataFrame The data to be used for prediction.
| Returns: |
|
|---|
Source code in framework\ProcessMining.py
134 135 136 137 138 139 140 141 142 143 144 145 146 | |
setup(data)
Setup the data for training and prediction. This function is called before training the model. Data must be in pandas DataFrame format - can be a columns, with several rows.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Source code in framework\ProcessMining.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | |