Multi-Touch Attribution (MTA) is a marketing and analytics methodology used to determine how different marketing channels and touchpoints contribute to a conversion or desired outcome, such as a sale, lead generation, or another key performance indicator (KPI). It aims to provide a more comprehensive and accurate understanding of the customer's journey by considering multiple interactions that a customer has with a brand before taking a specific action.
The traditional single-touch attribution models, such as last-click attribution, give all the credit for a conversion to the final touchpoint or interaction just before the conversion. This simplistic approach can lead to an inaccurate understanding of how various marketing efforts impact customer behavior because it ignores the influence of earlier touchpoints in the customer journey.
Multi-Touch Attribution takes a more nuanced approach and considers the entire customer journey, taking into account various touchpoints, channels, and interactions along the way. There are several different models and methodologies within MTA, including:
Multi-Touch Attribution is particularly valuable in today's digital marketing landscape, where customers often engage with multiple touchpoints and channels before making a purchase decision. By understanding the relative impact of different touchpoints, businesses can allocate their marketing budgets more effectively, optimize campaigns, and improve overall ROI. However, implementing MTA can be challenging, as it requires access to comprehensive data and the ability to analyze and interpret that data accurately.
In the following article, we will explore how Actable AI’s tools can be leveraged to perform MTA. Two different datasets will be used, to demonstrate the platform’s flexibility in handling different data structures and features.
Dataset 1
The first dataset contains one row per customer, with features indicating whether a customer has been converted to a paying customer, what channel was used for marketing, what channel was used for subscription, age group of client, language displayed to the user, and so on.
It is always important to first get a better understanding of the data and how the features relate to one another. This can be done easily in Actable AI using tools such as correlational analysis. The parameters can be set as follows:
The variable of interest, namely whether a customer has been converted to a paying customer, is specified in the ‘correlation target’ field. Meanwhile, any features for which the correlation needs to be measured with the target are specified in the ‘compared factors’ field. Other options are also available, such as the number of factors to display and whether values should be shown on the bar chart.
After clicking the ‘Run’ button and waiting for a few moments, the results are generated and displayed to the user:
Focusing on the marketing and subscription channels, it appears that ‘House Ads’ as the subscription channel is the most correlated with converting customers to paying customers. However, it is less effective when used as a marketing channel, where email is more correlated with our target of converting customers to paying customers.
Now that we have a better understanding of our features, we can proceed with training a machine learning model with the task of predicting if a customer will be converted to a paying customer. This can be done by selecting the ‘classification’ analytic with the following options:
Similar to the correlational analysis, the outcome that should be predicted should be specified in the ‘predicted target’ field, while any other features that should be used to predict the target are specified in the ‘predictors’ field.
Several other options can also be specified, including:
In this case, the ‘explain predictions’ option has been selected. This will enable the generation of what are known as Shapley values that can help us understand to what extent each variable has increased or decreased the prediction.
More advanced options can also be specified, such as the models to be trained and their hyperparameters. While the default settings generally work well, you might want to specify certain values to your liking, or try to tune them to improve performance. Actable AI will then leverage state-of-the-art AutoML techniques to automatically train several models with different hyperparameters, and select the one achieving the best performance.
The metric used for optimization can also be specified:
One important observation is the use of the ROC AUC as the optimisation metric, which is better suited for imbalanced datasets as is the case here. More details on all of the options available in the classification analytic can also be found in the user documentation.
Once we are satisfied with the settings, the ‘Run’ button can be clicked to start the model training process. When it is completed, a number of results are displayed:
First of all, we can check out the performance of the model using a number of metrics. Each of these compares the ground-truth values of customer conversion with those predicted by the best model. As can be observed, the results in this case are fairly good, with the ROC AUC (our chosen optimisation metric) having a value of 85.9%.
These metrics indicate that the model would perform well when used on real-world unseen data (data that is not used by the model when training it). The ROC and precision-recall graphs can also be viewed:
The probability threshold above which predictions are assigned to the positive class (i.e. that a customer has been converted) can also be adjusted. By default, it corresponds to 0.5, but clicking on the precision-recall curve will adjust its value. The results are also updated to match this new threshold. The threshold also affects the confusion matrix, where you can quickly check which classes might be confusing the trained model:
In this case, the model is more reliable in predicting whether a customer has not converted, than when a customer has been converted (87.84% vs 69.23%). However, as mentioned, adjusting the probability threshold can improve the accuracy of predicting the positive class (at the expense of lower accuracy for the negative class).
We can then observe which features are deemed to be important by the model:
It is evident that the subscribing channel is the most important feature, followed by the marketing channel. The other features do not seem to be helpful, and some might even be hurting performance (e.g. the age group).
Next, we can check out the raw values of the predictions and the Shapley values mentioned earlier:
As can be observed, each row contains three new columns: one for the negative class (customer is not converted), one for the positive class (customer is converted), and one for the predicted class (which is based on the threshold value discussed earlier).
Moreover, the extent to which each variable affects the outcome is also given in red or green; red values indicate that the value has decreased the value of the outcome (i.e. the probability of the positive class), while green values indicate that the value has increased the value of the prediction. These values are generated for each specific sample, enabling highly granular analysis of the model and how each variable affects the outcome. For example, taking a look at the subscribing channel column indicates that ‘House Ads’ tend to increase the prediction probability the most. This also tallies with what was observed for correlation analysis.
Further analysis of how the model predictions vary across different values of the variables can also be checked out in the recently introduced PDP and ICE plots:
An ICE plot shows the effect of a feature on the outcome, by freezing all the values of a sample except for the feature being investigated. The average across all samples yields the PD plot (PDP). In the above images, it is again evident that house ads are by far the most effective at converting a customer, although other channels also have some degree of effectiveness.
More information on the best model and the other models that have been trained can also be viewed in the ‘leaderboard’ tab:
Apart from the chosen evaluation metric, the amount of time required to train the model and to perform the predictions are given. This helps us determine if the amount of time required for the model to work will be sufficient for the given application. Note that the desired inference time can also be specified in the ‘Advanced’ tab. The hyperparameters and variables that have been used by the model are also shown, allowing us to gain a better insight into the model composition.
Once we are satisfied with the trained model, it can be used with new data by selecting the ‘Live Model’ tab where predictions can be generated with a new data set. Predictor values can also be input interactively in a form and predictions are generated on the fly:
If the options in the ‘Intervention’ tab are set, then it is also possible to use counterfactual analysis to determine the effect of a treatment variable (e.g. subscribing channels) on the predicted outcome, and obtain a new prediction. In other words, what happens to the predicted probabilities if the channels are changed? Common causes, also known as confounders, are variables that can affect both the outcome and the intervened variable. These can also be specified in the ‘common causes’ field, enabling causal inference techniques to yield new estimates on whether a customer is converted based on causal relationships:
An API can also be used to integrate the model into your existing application (web app, mobile app, etc.) . Click on ‘Live API tab’ and all the details of the API are shown:
Finally, the trained model can also be exported and used directly within Python by following the instructions in the ‘Export Model’ tab:
Dataset 2
The second dataset that will be explored is similar to the first, but the channels are represented as individual columns. Moreover, the value represents how many times the user has interacted with the channel in the past. Hence, a historical context is provided, summarising all previous records of a user.
We can once again train a classification model using the options below:
The following results are obtained:
As can be observed, the model appears to favour predicting the negative class, likely because the dataset is imbalanced (containing more negative examples than positive ones). However, as discussed earlier, the probability threshold can be adjusted to improve the performance for the positive class prediction:
As can be observed, the model is now much better at predicting the positive class, at the expense of a drop in performance when predicting the negative class. However, the gain in performance for the positive class prediction is higher than the loss in performance when predicting the negative class, so it can be argued that this is an overall better threshold that gives a more balanced performance between the two classes. The threshold can be further adjusted to suit the particular application and needs, for instance to accurately predict the positive class even if the prediction for the negative class is further reduced.
Of course, we did not perform any optimisation of parameters or settings apart from changing the optimisation metric. For example, the ‘Optimize for Quality’ option can be enabled to improve the performance (at the expense of a longer training time), model hyperparameters can be adjusted, and a larger number of trials can be enabled such that more models with a larger spread of parameters are trained. This increases the likelihood that better models can be trained.
Since each channel is represented in its own column, the feature importances can be used to determine the differences in importance among channels:
Feature importance of the best model
In this case, it appears that all channels are fairly equally important, although online videos appear to be slightly more useful in converting customers. Similar to the first dataset, we can also check out the Shapley values and prediction probabilities for each sample:
The Shapley values again help give a better understanding as to how each feature is contributing to the conversion probability for each individual customer. This allows us to adapt our marketing strategy to optimise the chances of converting customers.
More information on other functionalities of the platform can also be found in the user documentation.