Cement, an essential binding material in construction, serves as the backbone for our built environment. From towering skyscrapers to intricate bridges, the strength and durability of these structures rely on the quality of cement and its crucial property: compressive strength.
In this example, the Actable AI platform will be used to determine what affects concrete compressive strength and how we can predict it.
Before attempting to train any models, it is important to perform Exploratory Data Analysis (EDA) to get a better understanding of the features in our data and how they interact with each other. We can use the correlational analysis tool to find the strength of the relationships between our variable of interest and the rest of the features. The settings chosen are as follows:
The variable of interest, namely the concrete strength, is specified in the ‘correlation target’ field. Meanwhile, any features for which the correlation needs to be measured with the target are specified in the ‘compared factors’ field. Other options are also available, such as the number of factors to display and whether values should be shown on the bar chart.
After clicking the ‘Run’ button and waiting for a few moments, the results are generated and displayed to the user:
As can be observed, two important features appear to be age and water. Specifically, the strength tends to increase with increasing age and with decreasing water content. Other features, such as the amount of cement and superplastic, are also correlated with the strength.
Several other graphs are also generated, which can also be used to deduce the relationship amongst features:
The causal discovery analytic can also be used to discover causal relationships between variables. A causal relationship exists when one variable in a data set has a direct influence on another variable. Thus, one event triggers the occurrence of another event.
The causal discovery analytic has several options, as follows:
First of all, the features to be considered need to be specified in the ‘Selected Features’ field. Hence, the interactions between all of these features will be determined. The model to be used, its options, and any constraints and causal variables can be specified in the ‘Causal Graph’ tab:
Once satisfied with the options (default values should suffice in most cases), the ‘Run’ button can be clicked and a graph showing the causal relationships will be displayed after a few moments:
As can be observed, there are quite a few relationships amongst our features. Values representing the strength of the relation are also provided. For example, there is a clear relationship between ‘water’ and ‘superplastic’. This is actually expected, since superplasticizers enable reduction in water content. Furthermore, the negative value of the relationship (as indicated in red) signifies that as the amount of superplasticizer increases, the amount of water decreases and vice versa.
Having gained a better understanding of our data, we can proceed to training a model that can predict the concrete compressive strength. The regression analytic can be chosen, with the following options selected:
Similar to the correlational analysis, the outcome that should be predicted (concrete compressive strength) should be specified in the ‘predicted target’ field, while any other features that should be used to predict the target are specified in the ‘predictors’ field.
Several other options can also be specified, including:
In this case, the ‘explain predictions’ option has been selected. This will enable the generation of what are known as Shapley values that can help us understand to what extent each variable has increased or decreased the prediction.
More advanced options can also be specified, such as the models to be trained and their hyperparameters. While the default settings generally work well, you might want to specify certain values to your liking, or try to tune them to improve performance. Actable AI will then leverage state-of-the-art AutoML techniques to automatically train several models with different hyperparameters, and select the one achieving the best performance.
The metric used for optimization can also be specified:
More details on all of the options available in the regression analytic can also be found in the user documentation.
Once we are satisfied with the settings, the ‘Run’ button can be clicked to start the model training process. When it is completed, a number of results are displayed:
We can first analyze the performance of the model using a number of metrics. Each of these compares the ground-truth values of the concrete compressive strength with those predicted by the best model. As can be observed, the results in this case are very good, with the Root Mean Squared Error, Mean Absolute Error, and Median Absolute Error being quite close to 0 (the optimal value), and R2 being close to the optimal value of 1.0 (in this case, it can be said that the model has approximately 8% relative error).
These metrics indicate that the model would perform well when used on real-world unseen data (data that is not used by the model when training it).
We can then observe which features are deemed to be important by the model:
It is clear that age, cement, and water are all very useful features for the trained model, which is unsurprising given the results in the correlational analysis discussed earlier. Hence, these are the features which affect the concrete compressive strength the most.
Next, we can check out the raw values of the predictions and the Shapley values mentioned earlier:
Comparison of the ground-truth values (column ‘strength’) with the predicted values (‘strength_predicted’), it is clear that the predicted values are indeed very close to the actual values. Moreover, the extent to which each variable affects the outcome is also given in red or green; red values indicate that the value has decreased the value of the outcome (i.e. the concrete compressive strength), while green values indicate that the value has increased the value of the prediction. These values are generated for each specific sample, enabling highly granular analysis of the model and how each variable affects the outcome. This also helps determine how the concrete strength can be adjusted to the desired values.
Further analysis of how the model predictions vary across different values of the variables can also be checked out in the recently introduced PDP and ICE plots:
An ICE plot shows the effect of a feature on the outcome, by freezing all the values of a sample except for the feature being investigated. The average across all samples yields the PD plot (PDP). In the above images, it is again evident that a greater age and cement quantity tends to increase the strength.
However, in the case of cement, there appears to be a point where additional age does not yield further gains in strength, both in terms of the PDP (average) and in terms of the individual samples selected in the ICE. This helps us determine the amount of time required to attain the desired strength, without wasting any time for minimal to no gains.
More information on the best model and the other models that have been trained can also be viewed in the ‘leaderboard’ tab:
Apart from the chosen evaluation metric, the amount of time required to train the model and to perform the predictions are given. This helps us determine if the amount of time required for the model to work will be sufficient for the given application. Note that the desired inference time can also be specified in the ‘Advanced’ tab. The hyperparameters and variables that have been used by the model are also shown, allowing us to gain a better insight into the model composition.
Once we are satisfied with the trained model, it can be used with new data by selecting the ‘Live Model’ tab where predictions can be generated with a new data set. Predictor values can also be input interactively in a form and predictions are generated on the fly:
An API can also be used to integrate the model into your existing application (web app, mobile app, etc.) . Click on ‘Live API tab’ and all the details of the API are shown:
Finally, the trained model can also be exported and used directly within Python by following the instructions in the ‘Export Model’ tab:
The above example demonstrated the use of the Actable AI platform to analyze data and generate a predictive model that is capable of estimating concrete compressive strength based on factors such as age, cement quantity, water quantity, etc. We also gained several insights into the variables that most influence the prediction, helping us to understand the model and the most relevant features affecting concrete compressive strength. Actable AI thus makes it very easy to determine how to adjust our variables in order to attain the desired concrete compressive strength.
More information on other functionalities of the platform can also be found in the user documentation.