A comparative analysis of consumer credit risk models in Peer-to-Peer Lending
Keywords:
P2P lending, Lending club, Default risk, Credit risk models, GBDTAbstract
PurposeThe purpose of this paper is to compare nine different models to evaluate consumer credit risk, which are the following: Logistic Regression (LR), Naive Bayes (NB), Linear Discriminant Analysis (LDA), k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), Classification and Regression Tree (CART), Artificial Neural Network (ANN), Random Forest (RF) and Gradient Boosting Decision Tree (GBDT) in Peer-to-Peer (P2P) Lending.
Design/methodology/approachThe author uses data from P2P Lending Club (LC) to assess the efficiency of a variety of classification models across different economic scenarios and to compare the ranking results of credit risk models in P2P lending through three families of evaluation metrics.
FindingsThe results from this research indicate that the risk classification models in the 2013–2019 economic period show greater measurement efficiency than for the difficult 2007–2012 period. Besides, the results of ranking models for predicting default risk show that GBDT is the best model for most of the metrics or metric families included in the study. The findings of this study also support the results of Tsai et al. (2014) and Teplý and Polena (2019) that LR, ANN and LDA models classify loan applications quite stably and accurately, while CART, k-NN and NB show the worst performance when predicting borrower default risk on P2P loan data.
Originality/valueThe main contributions of the research to the empirical literature review include: comparing nine prediction models of consumer loan application risk through statistical and machine learning algorithms evaluated by the performance measures according to three separate families of metrics (threshold, ranking and probabilistic metrics) that are consistent with the existing data characteristics of the LC lending platform through two periods of reviewing the current economic situation and platform development.
Downloads
References
Bae, J.K., Lee, S.I. and Seo, H.J. (2018), “Predicting online peer-to-peer (P2P) lending default using data mining techniques”, Proceedings of 20th Asia-Pacific Conference on Global Business, Economics, Finance and Social Sciences, Hong Kong, August 1-2, 2018, SAR – PRC, Paper ID:H808.
Brown, L. and Mues, C. (2012), “An experimental comparison of classification algorithms for imbalanced credit scoring data sets”, Expert Systems with Applications, Vol. 39 No. 3, pp. 3446-3453, doi: 10.1016/j.eswa.2011.09.033.
Brownlee, J. (2020), Imbalanced Classification with Python: Better Metrics, Balance Skewed Classes, Cost-Sensitive Learning, Machine Learning Mastery, Vermont.
Chang, S., Kim, S.D. and Kondo, G. (2015), “Predicting default risk of lending club”, Vol. CS229, Machine Learning.
Dinh, T.H.T., Kleimeier, S. and Straetmans, S.T.M. (2013), “Bank lending strategy, credit scoring and financial crises”, in Research Memoranda 053, Maastricht University, Graduate School of Business and Economics (GSBE), doi: 10.26481/umagsb.2013053.
Fernández, A., García, S., Galar, M., Prati, R.C. and Krawczyk, B. (2018), Learning from Imbalanced Data Set, 1st ed. Edition, Springer, Berlin, doi: 10.1007/978-3-319-98074-4.
Ferri, C., Hernández-Orallo, J. and Modroiu, R. (2009), “An experimental comparison of performance measures for classification”, Pattern Recognition Letters, Vol. 30 No. 1, pp. 27-38, doi: 10.1016/j.patrec.2008.08.010.
Giannopoulos, V. (2018), “The effectiveness of artificial credit scoring models in predicting NPLs using micro accounting data”, Journal of Accounting and Marketing, Vol. 7 No. 4, doi: 10.4172/21689601.1000303.
Hand, D.J. and Henley, W.E. (1997), “Statistical classification methods in consumer credit scoring: a review”, Journal of the Royal Statistical Society, Vol. 160 No. 3, pp. 523-541, doi: 10.1111/j.1467-985X.1997.00078.x.
Hastie, T., Tibshirani, R. and Friedman, J. (2009), The Elements of Statistical Learning Data Mining, Inference, and Prediction, Springer Series in Statistics, New York, doi: 10.1007/978-0-387-84858-7.
He, H. and Ma, Y. (Eds) (2013), Imbalanced Learning: Foundations, Algorithms, and Applications, John Wiley & Sons, doi: 10.1002/9781118646106.
Ince, H. and Aktan, B. (2009), “A comparison of data mining techniques for credit scoring in banking: a managerial perspective”, Journal of Business Economics and Management, Vol. 10 No. 3, pp. 233-240, doi: 10.3846/1611-1699.2009.10.233-240.
Jin, Y. and Zhu, Y. (2015), “A data-driven approach to predict default risk of loan for online Peer-to-Peer (P2P) lending”, Fifth International Conference on Communication Systems and Network Technologies, pp. 609-613, doi: 10.1109/CSNT.2015.25.
Lin, X., Li, X. and Zheng, Z. (2016), “Evaluating borrower's default risk in peer-to-peer lending: evidence from a lending platform in China”, Applied Economics, Vol. 49 No. 35, pp. 3538-3545, doi: 10.1080/00036846.2016.1262526.
Madzova, V. and Ramadini, N. (2013), “Can credit scoring models prevent default payments in banking industry in the period of financial crisis?”, International Journal of Business and Technology, Vol. 2 No. 1, pp. 32-38, doi: 10.33107/ijbte.2013.2.1.05.
Malekipirbazari, M. and Aksakall, V. (2015), “Risk assessment in social lending via random forests”, Expert Systems with Applications, Vol. 42 No. 10, pp. 4621-4631, doi: 10.1016/j.eswa.2015.02.001.
Malik, M. and Thomas, L.C. (2010), “Modelling credit risk of portfolio of consumer loans”, Journal of the Operational Research Society, Vol. 61 No. 3, pp. 411-420, doi: 10.1057/jors.2009.123.
Namvar, E. (2013), “An introduction to peer-to-peer loans as investments”, Journal of Investment Management, First Quarter, doi: 10.2139/ssrn.2227181.
Namvar, A., Siami, M., Rabhi, F. and Naderpour, M. (2018), “Credit risk prediction in an imbalanced social lending environment”, International Journal of Computational Intelligence Systems, Vol. 11 No. 1, pp. 925-935, doi: 10.48550/arXiv.1805.00801.
Niu, B., Ren, J. and Li, X. (2019), “Credit scoring using machine learning by combing social Network information: evidence from peer-to-peer lending”, Information, Vol. 10 No. 12, p. 397, doi: 10.3390/info10120397.
Odeh, O.O., Featherstone, A.M. and Sanjoy, D. (2006), “Predicting credit default in an agricultural bank: methods and issues”, 2006 Annual Meeting, Orlando, Florida 35359, Southern Agricultural Economics Association, doi: 10.22004/ag.econ.35359.
Reddy, S. (2016), “Peer to peer lending, default prediction evidence from lending club”, Journal of Internet Banking and Commerce, Vol. 21 No. 3, pp. 1-19.
Shmueli, G., Bruce, P.C., Yahav, I., Patel, N.R. and Lichtendahl, K.C. (2018), Data Mining for Business Analytics: Concepts, Techniques, and Applications in R, 1st ed., Wiley, Hoboken.
Tao, W. and Chang, D. (2019), “Credit risk assessment of P2P lending borrowers based on SVM”, Advances in Economics, Business and Management Research, Vol. 80, pp. 182-190, doi: 10.2991/bems-19.2019.33.
Teplý, P. and Polena, M. (2019), “Best classification algorithms in peer-to-peer lending”, North American Journal of Economics and Finance, Vol. 51, 100904, doi: 10.1016/j.najef.2019.01.001.
Tsai, K., Ramiah, S. and Singh, S. (2014), Peer Lending Risk Predictor, Stanford University, Stanford, California, Vol. CS229, doi: 10.13140/2.1.4810.6567.
Wendler, T. and Gröttrup, S. (2016), “Neuronal networks”, in Data Mining with SPSS Modeler: Theory, Exercises and Solutions, Springer International Publishing, pp. 833-878, doi: 10.1007/978-3-319-28709-6.
Xiao, W., Zhao, Q. and Fei, Q. (2006), “A comparative study of data mining methods in consumer loans credit scoring management”, Journal of Systems Science and Systems Engineering, Vol. 15 No. 4, pp. 419-435, doi: 10.1007/s11518-006-5023-5.
Yeh, I.C. and Lien, C.H. (2009), “The comparisons of data mining techniques for predictive accuracy of probability of default of credit card clients”, Expert Systems with Applications, Vol. 36 No. 2, pp. 2473-2480, doi: 10.1016/j.eswa.2007.12.020.
Zhang, D., Huang, H., Chen, Q. and Jiang, Y. (2007), “A comparison study of credit scoring models”, Proceedings of the 3rd International Conference on Natural Computation, Haikou, China, pp 15-18, doi:10.1109/ICNC.2007.15.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Lua Thi Trinh
This work is licensed under a Creative Commons Attribution 4.0 International License.