Sherly Alfonso-Sánchez


Why does the area where a policyholder lives tell us so much about their risk? In motor insurance, the link is intuitive; our driving habits and accident risks are shaped by the environments we navigate daily. 
Existing research, such as Burdett et al. (2017), has highlighted the “close to home” effect, noting that a significant portion of crashes occur near a driver’s residence, often at a higher rate than driving volume alone would suggest. This implies that the residential environment (local road design, traffic density, and land use) carries a “spatial context” that is highly relevant to risk.
The Challenge: Modeling with Limited Data. While this geographic context is clearly important, researchers often face a major difficulty: public datasets frequently lack granular location data. In our latest study, we set out to see if we could still capture these vital geographic signals under these constraints by using a zone-level modeling framework.
Our Approach: We investigated whether the environment surrounding the center of a policyholder’s municipality postcode could provide the necessary predictive information. We integrated several alternative data sources and used the BeMTPL97 CAS data set:
  • Environmental Indicators: Extracted from OpenStreetMap and CORINE Land Cover.
  • Aerial Imagery: High-resolution images from the Belgian National Geographic Institute for academic purposes.
Our Results:
  • While both linear and tree-based models benefited on average the most from environmental features extracted at a 5 km scale from the municipality center, we also found that  smaller neighborhoods also improve baseline specifications. This confirms that geographic signals are present across multiple spatial resolutions.
  • Image embeddings do not improve performance when environmental features are available; however, when such features are absent, pretrained vision-transformer embeddings enhance accuracy and stability for regularized GLMs.
  • These performance gains are not accidental. We conducted robustness checks using different data splits, confirming that the predictive behavior remains stable and reliable across unseen postcodes.
The Takeaway: The most important lesson from this work is that the predictive value of geography depends less on the complexity of the AI model and more on how geography is represented. And also,  we have shown that even limited spatial representations can significantly outperform traditional insurance variables alone.
Read the preprint in this ArXiV link. The paper is still a preprint, so please take it as a work in progress as it will evolve through the peer-review process. We welcome any feedback! 
You can also hear the machine-generated podcast below.


Our research focuses on using reinforcement learning (RL) to address the credit limit modification problem for companies offering credit card products. This involves two main challenges: defining the RL problem for this specific task and training the RL agent without conducting online experiments with customers.

To define the RL problem, we consider the financial history of credit card holders and the expected losses due to defaults when deciding whether to increase or maintain their credit limits. The actions available are increasing the limit or keeping it the same. We calculate the reward function based on the expected profit, considering the revolving aspect of credit card usage. This differs from previous studies that overlooked this aspect in profit calculations.

To train the RL agent offline, we use a two-stage model to simulate the balance after taking an action. This involves selecting the balance type and predicting the balance amount using a regressor model. Through our experiments, we found that our trained Double-Q learning agent outperformed other strategies, including the one used by Rappi, a Latin American fintech company known for its delivery and commerce services that has also ventured into banking with its RappiCard credit card, and that was our collaborator in this research.

Our research contributes by providing a conceptual framework for applying RL to credit limit adjustments and emphasizes data-driven decision-making rather than relying solely on expert judgments. Furthermore, we discovered that incorporating additional predictors did not improve the performance of our simulator. This implies that fintech companies do not necessarily have an advantage over traditional banking institutions in this specific task.  Figure 1  provides an overview of the proposed methodology’s general workflow.

 

 

 

 

 

 

 

 

 

 

 

Figure 1: Methodology’s general workflow.

Link to the working paper: https://arxiv.org/abs/2306.15585