Logo Avenue
Financial
About the Client

Financial broker, which aims to make it easier to make investments in foreign currencies and exchanges

Solution Area

Artificial Intelligence

Industry

Financial

Client Location

Brazil

The Challenge

Given a set of previously developed machine learning models that predicted whether a customer would purchase a financial product, the client also wanted to assess the potential sales amount in terms of currency. This would enable the marketing team to not only target individuals with a high propensity to buy but also to focus on specific audiences based on the estimated monetary value of the financial products they would purchase. This approach would result in more tailored and effective marketing campaigns. The objective was to quickly deploy several regression models for such predictions, as a Proof of Concept (PoC).

The dataset consisted of millions of rows and over 300 predictor variables, making the feature selection process particularly challenging. The data was highly sparse, with unusual and heavily skewed statistical distributions, which is common in the financial sector. Many variables had a high proportion of zero values for numerous customers. This created a difficult environment for data cleaning, transformation, and treatment, as well as for fitting traditional regression models. Moreover, the volume and complexity of the data would result in high computational time requirements, further complicating the analysis process.

The Business Solution

To meet the client’s goal of enhancing their marketing strategy, the business solution focused on identifying high-value customers based on their potential monetary contribution, in addition to their likelihood to purchase financial products. This dual-targeting approach would enable the marketing team to prioritize high-value prospects, allocate resources more effectively, and create personalized marketing campaigns aimed at maximizing revenue. The client sought a rapid Proof of Concept (PoC) to evaluate the feasibility and potential impact of integrating purchase value predictions into their marketing efforts. By using these insights, the marketing team could better segment their audience, target promotions, and forecast sales more accurately. The successful deployment of this solution would demonstrate the business value of predictive analytics in refining marketing strategies and justify further investment in machine learning capabilities to support decision-making across the organization.

The Solution

Leveraging the robust efficiency and efficacy of CatBoost models, we developed several CatBoost regressors to quickly identify the important features for the task and subsequently fit the models for prediction. We filtered the database based on propensity models previously trained by Avenue Code. The machine learning architecture was designed using Vertex AI services, primarily Kubeflow in Vertex AI Pipelines. This setup enabled parallelism in the training process, allowing us to allocate one node (machine) per model, reducing the training time from 18 hours to just 2 hours. Additionally, the CatBoost models were optimized through hyperparameter tuning. We logged the metrics and the predicted databases into BigQuery, and showcased the results using Looker Platform dashboards: explainability, metrics, and t-Tests for the numeric predictor variables.

The Results

For most models, we achieved better performance compared to the naive method based on averages. Specifically, we optimized the stock purchase regression model through up to 10 iterations, reducing the MAE to just above 5,000, compared to over 15,000 for the naive model. For the Net Inflow, ETFs, and CDs models, the MAE achieved after hyperparameter tuning was close to zero, significantly lower than the error of the average-based method. By leveraging CatBoost regression models with a carefully tuned set of parameters, along with a robust machine learning infrastructure on Vertex AI and intelligent analytics pipeline using BigQuery and Looker, we successfully tested several regression models. Some of these models demonstrated sufficient quality to be considered production-ready, significantly outperforming the naive methods.

Key Metrics 

MAE for Stocks model: approximately 5,000 (compared to 15,000 MAE from the naive model).

MAE for Net Inflow, ETFs, and CDs models: approximately zero.

Tech Stacks 

BigQuery

Vertex AI