African Journal of
Business Management

  • Abbreviation: Afr. J. Bus. Manage.
  • Language: English
  • ISSN: 1993-8233
  • DOI: 10.5897/AJBM
  • Start Year: 2007
  • Published Articles: 4188

Full Length Research Paper

A comparative study of data mining techniques in predicting consumers’ credit card risk in banks

Ling Kock Sheng1* and Teh Ying Wah2
1Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia. 2Department of Information Science, Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia.  
Email: [email protected]

  •  Accepted: 21 June 2011
  •  Published: 30 September 2013

Abstract

This paper investigates the use of batch and incremental classifiers such as logistic regression, neural networks, C5, Naïve Bayes updateable, IBk (instance-based learner, k nearest neighbour) and raced incremental logit boost to obtain the best classifier to be used for improving the predictive accuracy of consumers’ credit card risk of a bank in Malaysia. Prior to generating all the models for comparison, the initial set of data is also loaded into an ETL (extraction, transformation, loading) system developed to perform feature selection or attribute relevancy analysis using ID3 algorithm, compiling a subset of data with the highest information gain and gain ratio. An extended test is performed to use equal length binning on some attributes to find if it affects the relevancy of each attribute. The selected subset of data of 24 months is used to generate various data mining models using different training and testing sizes and binning sizes. C5 emerged consistently as the technique that have generated the best models with an average predictive accuracy as high as 94.68%. Sample sizes, equal-length binning sizes and training and testing sizes are all shown to have an effect on accuracy in different intensity.

 

Key words: Data mining techniques, predictive accuracy, incremental learning schemes.