Software Developer and Writer: Credit Card Churning Customer Analysis and Prediction Using Machine Learning and Deep Learning with Python --- SECOND EDITION (VIVIAN SIAHAAN)

Monday, July 17, 2023

Credit Card Churning Customer Analysis and Prediction Using Machine Learning and Deep Learning with Python --- SECOND EDITION (VIVIAN SIAHAAN)

The project "Credit Card Churning Customer Analysis and Prediction Using Machine Learning and Deep Learning with Python" involved a comprehensive analysis and prediction task focused on understanding customer attrition in a credit card churning scenario. The objective was to explore a dataset, visualize the distribution of features, and predict the attrition flag using both machine learning and artificial neural network (ANN) techniques.

The project began by loading the dataset containing information about credit card customers, including various features such as customer demographics, transaction details, and account attributes. The dataset was then explored to gain a better understanding of its structure and contents. This included checking the number of records, identifying the available features, and inspecting the data types. To gain insights into the data, exploratory data analysis (EDA) techniques were employed. This involved examining the distribution of different features, identifying any missing values, and understanding the relationships between variables. Visualizations were created to represent the distribution of features. These visualizations helped identify any patterns, outliers, or potential correlations in the data.

The target variable for prediction was the attrition flag, which indicated whether a customer had churned or not. The dataset was split into input features (X) and the target variable (y) accordingly. Machine learning algorithms were then applied to predict the attrition flag. Various classifiers such as Logistic Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), K-Nearest Neighbors (NN), Gradient Boosting, Extreme Gradient Boosting, Light Gradient Boosting, were utilized. These models were trained using the training dataset and evaluated using appropriate performance metrics.

Model evaluation involved measuring the accuracy, precision, recall, and F1-score of each classifier. These metrics provided insights into how well the models performed in predicting customer attrition. Additionally, a confusion matrix was created to analyze the true positive, true negative, false positive, and false negative predictions. This matrix allowed for a deeper understanding of the classifier's performance and potential areas for improvement.

Next, a deep learning approach using an artificial neural network (ANN) was employed for attrition flag prediction. The dataset was preprocessed, including features normalization, one-hot encoding of categorical variables, and splitting into training and testing sets. The ANN model architecture was defined, consisting of an input layer, one or more hidden layers, and an output layer. The number of nodes and activation functions for each layer were determined based on experimentation and best practices. The ANN model was compiled by specifying the loss function, optimizer, and evaluation metrics. Common choices for binary classification problems include binary cross-entropy loss and the Adam optimizer. The model was then trained using the training dataset. The training process involved feeding the input features and target variable through the network, updating the weights and biases using backpropagation, and repeating this process for multiple epochs. During training, the model's performance on both the training and validation sets was monitored. This allowed for the detection of overfitting or underfitting and the adjustment of hyperparameters, such as the learning rate or the number of hidden layers, if necessary.

The accuracy and loss values were plotted over the epochs to visualize the training and validation performance of the ANN. These plots provided insights into the model's convergence and potential areas for improvement. After training, the model was used to make predictions on the test dataset. A threshold of 0.5 was applied to the predicted probabilities to classify the predictions as either churned or not churned customers. The accuracy score was calculated by comparing the predicted labels with the true labels from the test dataset. Additionally, a classification report was generated, including metrics such as precision, recall, and F1-score for both churned and not churned customers.

To further evaluate the model's performance, a confusion matrix was created. This matrix visualized the true positive, true negative, false positive, and false negative predictions, allowing for a more detailed analysis of the model's predictive capabilities. Finally, a custom function was utilized to create a plot comparing the predicted values to the true values for the attrition flag. This plot visualized the accuracy of the model and provided a clear understanding of how well the predictions aligned with the actual values.

Through this comprehensive analysis and prediction process, valuable insights were gained regarding customer attrition in credit card churning scenarios. The machine learning and ANN models provided predictions and performance metrics that can be used for decision-making and developing strategies to mitigate attrition. Overall, this project demonstrated the power of machine learning and deep learning techniques in understanding and predicting customer behavior. By leveraging the available data, it was possible to uncover patterns, make accurate predictions, and guide business decisions aimed at retaining customers and reducing attrition in credit card churning scenarios.