CREDIT RISK ANALYSIS
A Supporting Notebook

my boat

HOME CREDIT ANALYSIS

The intention of this analysis is to grasp the programatic process of analyzing publicly available credit data. We will proceed with basic eda and preprocessing of the data.

The models that we have decided to use will be drastically different so that we can gain insight into how they perform with this given dataset. The models include Light Gradient Boosting Machine and an artificial neural network.

The performance metrics that we have chosen to use are primarily the Area under the ROC (AUR), the Gini, and the KS statistic. Also included are each of the rank order charts produced by each model. Lastly, there is a set of overall visualizations per each model output so that we can visualize the output metrics.

Imports

When processing data through python and jupyter, we have the ability to customize our plotting output. We also need to import the correct library dependencies. We have done this below, where the only the color scale is visible.

None
Using TensorFlow backend.
['application_test.csv', 'application_train.csv', 'bureau.csv', 'bureau_balance.csv', 'credit_card_balance.csv', 'home-credit-default-risk.zip', 'HomeCredit_columns_description.csv', 'installments_payments.csv', 'POS_CASH_balance.csv', 'previous_application.csv', 'sample_submission.csv']

Functions & Load Data

Below, we are loading in specific functions for this analysis.

We are also reviewing the top 5 rows of the training and testing data so that we can review them accordingly. There appears to be 191 attributes, including the target variable, which is indicative of default or not. (1=Yes, 0=No).

Preprocessing started.
Bureau_Balance
Bureau
Previous_Application
POS_CASH_Balance
Credit_Card_Balance
Installments_Payments
Train/Test
Shapes :  (307511, 122) (48744, 121)
Preprocessing done.
SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT_x AMT_ANNUITY_x ... cc_bal_CNT_INSTALMENT_MATURE_CUM cc_bal_SK_DPD cc_bal_SK_DPD_DEF inst_SK_ID_PREV inst_NUM_INSTALMENT_VERSION inst_NUM_INSTALMENT_NUMBER inst_DAYS_INSTALMENT inst_DAYS_ENTRY_PAYMENT inst_AMT_INSTALMENT inst_AMT_PAYMENT
0 100002 1 Cash loans M N Y 0 202500.0 406597.5 24700.5 ... NaN NaN NaN 19.0 1.052632 10.000000 -295.000000 -315.421053 11559.247105 11559.247105
1 100003 0 Cash loans F N N 0 270000.0 1293502.5 35698.5 ... NaN NaN NaN 25.0 1.040000 5.080000 -1378.160000 -1385.320000 64754.586000 64754.586000
2 100004 0 Revolving loans M Y Y 0 67500.0 135000.0 6750.0 ... NaN NaN NaN 3.0 1.333333 2.000000 -754.000000 -761.666667 7096.155000 7096.155000
3 100006 0 Cash loans F N Y 0 135000.0 312682.5 29686.5 ... 0.0 0.0 0.0 16.0 1.125000 4.437500 -252.250000 -271.625000 62947.088438 62947.088438
4 100007 0 Cash loans M N Y 0 121500.0 513000.0 21865.5 ... NaN NaN NaN 66.0 1.166667 7.045455 -1028.606061 -1032.242424 12666.444545 12214.060227

5 rows × 191 columns

SK_ID_CURR NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT_x AMT_ANNUITY_x AMT_GOODS_PRICE_x ... cc_bal_CNT_INSTALMENT_MATURE_CUM cc_bal_SK_DPD cc_bal_SK_DPD_DEF inst_SK_ID_PREV inst_NUM_INSTALMENT_VERSION inst_NUM_INSTALMENT_NUMBER inst_DAYS_INSTALMENT inst_DAYS_ENTRY_PAYMENT inst_AMT_INSTALMENT inst_AMT_PAYMENT
0 100001 Cash loans F N Y 0 135000.0 568800.0 20560.5 450000.0 ... NaN NaN NaN 7.0 1.142857 2.714286 -2187.714286 -2195.000000 5885.132143 5885.132143
1 100005 Cash loans M N Y 0 99000.0 222768.0 17370.0 180000.0 ... NaN NaN NaN 9.0 1.111111 5.000000 -586.000000 -609.555556 6240.205000 6240.205000
2 100013 Cash loans M Y Y 0 202500.0 663264.0 69777.0 630000.0 ... 18.719101 0.010417 0.010417 155.0 0.277419 43.729032 -1352.929032 -1358.109677 10897.898516 9740.235774
3 100028 Cash loans F N Y 2 315000.0 1575000.0 49018.5 1575000.0 ... 19.547619 0.000000 0.000000 113.0 0.460177 30.504425 -855.548673 -858.548673 4979.282257 4356.731549
4 100038 Cash loans M Y N 1 180000.0 625500.0 32067.0 625500.0 ... NaN NaN NaN 12.0 1.000000 6.500000 -622.000000 -634.250000 11100.337500 11100.337500

5 rows × 190 columns

Initial EDA

Starting our EDA, we will review the sparcity of the data as well as any superficial information that we can gain without going too deep into the analysis processs.

Looking at the distribution of credit amount below, we see initially notice that the data is right skewed, where most of the observations would have credit limits below 1,000,000.

Target variable: Below we can see that ~92% of observations do not show default, while only ~8% do. This almost aligns with the national average which is about 10%. It also means that the dataset is imbalanced, where our negative class is 11.5 times larger, carrying more predictive weight.

<Figure size 1440x720 with 0 Axes>
<Figure size 1440x720 with 0 Axes>

Setting Train and Test

Before running the data through the models, we first need to preprocess the data; this is particularly important for the nueral network.

The preprocess steps that we undertake here include encoding of the categorical variables followed by embedding.

We can see that there are 16 categorical variables and 173 strictly numeric variables.

Number of Numerical features: 173
Number of Categorical features: 16

For simplicity, the categorical variables are listed below.

The unique values per attribute are also listed. We can see that some attributes like Organization type have 58 possible values/categories, while attributes like gender, education type, and house type only have between 3 and 5 values.

CODE_GENDER: 3 values
NAME_TYPE_SUITE: 8 values
NAME_INCOME_TYPE: 8 values
NAME_EDUCATION_TYPE: 5 values
NAME_FAMILY_STATUS: 6 values
NAME_HOUSING_TYPE: 6 values
OCCUPATION_TYPE: 19 values
WEEKDAY_APPR_PROCESS_START: 7 values
ORGANIZATION_TYPE: 58 values
FONDKAPREMONT_MODE: 5 values
HOUSETYPE_MODE: 4 values
WALLSMATERIAL_MODE: 8 values
EMERGENCYSTATE_MODE: 3 values

 Number of embeded features : 13

We are using a 80/20 train and test split for this analysis. Because this is only an overview analysis, we will not be running a full pipeline with CV. The output for the train/test split can be seen below.

 Train:  (246008, 189) 246008 80.0 %
 Train:  (246008,) 246008 80.0 %
 Test:  (61503, 189) 61503 20.0 % 
 Test:  (61503,) 61503 20.0 %
NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT_x AMT_ANNUITY_x AMT_GOODS_PRICE_x NAME_TYPE_SUITE ... cc_bal_CNT_INSTALMENT_MATURE_CUM cc_bal_SK_DPD cc_bal_SK_DPD_DEF inst_SK_ID_PREV inst_NUM_INSTALMENT_VERSION inst_NUM_INSTALMENT_NUMBER inst_DAYS_INSTALMENT inst_DAYS_ENTRY_PAYMENT inst_AMT_INSTALMENT inst_AMT_PAYMENT
110440 0 0 0 1 0 76500.0 441481.5 16771.5 364500.0 6 ... NaN NaN NaN 42.0 1.047619 8.52381 -607.047619 -645.02381 9708.5925 9708.5925

1 rows × 189 columns

NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT_x AMT_ANNUITY_x AMT_GOODS_PRICE_x NAME_TYPE_SUITE ... cc_bal_CNT_INSTALMENT_MATURE_CUM cc_bal_SK_DPD cc_bal_SK_DPD_DEF inst_SK_ID_PREV inst_NUM_INSTALMENT_VERSION inst_NUM_INSTALMENT_NUMBER inst_DAYS_INSTALMENT inst_DAYS_ENTRY_PAYMENT inst_AMT_INSTALMENT inst_AMT_PAYMENT
179491 0 0 0 0 0 103500.0 625536.0 26631.0 540000.0 1 ... NaN NaN NaN 41.0 1.0 6.04878 -2475.365854 -2476.878049 7685.107683 6237.18439

1 rows × 189 columns

110440    0
Name: TARGET, dtype: int64
179491    0
Name: TARGET, dtype: int64
Length of the list: 14
Length of the list: 14
(246008,)

Train and Test - Feature Set

This section is a rough duplicate of the above section, with one change. While the above train/test split uses the full set of attributes, the below data only encorporates the top 25 most important features as an output of the shapley values. These are list in the code below.

Number of Numerical features: 21
Number of Categorical features: 4
CODE_GENDER: 3 values
NAME_FAMILY_STATUS: 6 values
NAME_EDUCATION_TYPE: 5 values
NAME_INCOME_TYPE: 8 values

 Number of embeded features : 4
 x Train:  (246008, 25) 246008 80.0 %
 y Train:  (246008,) 246008 80.0 %
 x Test:  (61503, 25) 61503 20.0 % 
 y Test:  (61503,) 61503 20.0 %
EXT_SOURCE_2 SK_DPD_DEF DAYS_EMPLOYED EXT_SOURCE_1 SK_ID_PREV_y inst_AMT_PAYMENT CODE_GENDER CNT_PAYMENT DAYS_BIRTH CNT_INSTALMENT_FUTURE ... NAME_EDUCATION_TYPE NFLAG_INSURED_ON_APPROVAL AMT_DOWN_PAYMENT SK_DPD REGION_RATING_CLIENT_W_CITY EXT_SOURCE_3 AMT_GOODS_PRICE_x AMT_ANNUITY_x cc_bal_AMT_CREDIT_LIMIT_ACTUAL NAME_INCOME_TYPE
110440 0.172563 0.0 365243 NaN 43.0 9708.5925 0 24.0 -22164 19.488372 ... 4 0.666667 0.0 0.0 3 0.454321 364500.0 16771.5 NaN 3

1 rows × 25 columns

EXT_SOURCE_2 SK_DPD_DEF DAYS_EMPLOYED EXT_SOURCE_1 SK_ID_PREV_y inst_AMT_PAYMENT CODE_GENDER CNT_PAYMENT DAYS_BIRTH CNT_INSTALMENT_FUTURE ... NAME_EDUCATION_TYPE NFLAG_INSURED_ON_APPROVAL AMT_DOWN_PAYMENT SK_DPD REGION_RATING_CLIENT_W_CITY EXT_SOURCE_3 AMT_GOODS_PRICE_x AMT_ANNUITY_x cc_bal_AMT_CREDIT_LIMIT_ACTUAL NAME_INCOME_TYPE
179491 0.694524 0.078947 -3844 NaN 38.0 6237.18439 0 9.5 -14921 4.710526 ... 4 1.0 1226.705625 0.078947 1 0.479449 540000.0 26631.0 NaN 7

1 rows × 25 columns

110440    0
Name: TARGET, dtype: int64
179491    0
Name: TARGET, dtype: int64
Length of the list: 5
Length of the list: 5

Model - Neural Net

For the neural network structure we decided to use a four layer dense neural network with 'relu' activation.

We are able to get achive a fairly close convergence of the train and test loss, different by approximately 0.003. The preliminary AUC for this test run is 73.5%. We also see that the model took roughly 1 minute to run.

Train on 246008 samples, validate on 61503 samples
Epoch 1/5
246008/246008 [==============================] - 5s 20us/step - loss: 0.2930 - val_loss: 0.2600
Epoch 2/5
246008/246008 [==============================] - 4s 15us/step - loss: 0.2595 - val_loss: 0.2570
Epoch 3/5
246008/246008 [==============================] - 4s 15us/step - loss: 0.2564 - val_loss: 0.2553
Epoch 4/5
246008/246008 [==============================] - 4s 15us/step - loss: 0.2549 - val_loss: 0.2538
Epoch 5/5
246008/246008 [==============================] - 4s 15us/step - loss: 0.2543 - val_loss: 0.2533
Train on 246008 samples, validate on 61503 samples
Epoch 1/5
246008/246008 [==============================] - 5s 20us/step - loss: 0.2916 - val_loss: 0.2601
Epoch 2/5
246008/246008 [==============================] - 4s 15us/step - loss: 0.2589 - val_loss: 0.2565
Epoch 3/5
246008/246008 [==============================] - 4s 14us/step - loss: 0.2560 - val_loss: 0.2556
Epoch 4/5
246008/246008 [==============================] - 4s 15us/step - loss: 0.2547 - val_loss: 0.2538
Epoch 5/5
246008/246008 [==============================] - 4s 15us/step - loss: 0.2536 - val_loss: 0.2562
Model Time  :   1.05 Min
Mean out of fold Train AUC: 0.74530
Mean out of fold Test AUC: 0.73562
Full validation Train AUC: 0.74511
Full validation Test AUC: 0.73583

Model - Lightgbm

Below we program the lightGBM model. This model runs very fast, taking only 0.18 mins.

The results are better than the nueral network by roughly 3% with an AUC of about 76.1%.

Training until validation scores don't improve for 10 rounds
[100]	valid's auc: 0.754034
[200]	valid's auc: 0.759953
[300]	valid's auc: 0.76142
Early stopping, best iteration is:
[291]	valid's auc: 0.761466
Model Time  :   0.18 Min

Metrics

Before moving onto the main metric visualizations, we print out the metrics below. The first set of metrics is for the train and test data on the neural network while the second set is for the LightGBM model.

NN Metrics:
TRAIN MODEL:
Gini: 0.5071 | AUC: 0.7536 | KS: 0.3784
TEST MODEL:
Gini: 0.4877 | AUC: 0.7439 | KS: 0.3743
Light Metrics:
TRAIN MODEL:
Gini: 0.584 | AUC: 0.792 | KS: 0.4379
TEST MODEL:
Gini: 0.5229 | AUC: 0.7615 | KS: 0.3951

Rank Order

TRAINING MODEL DATA:
RANK_PCT RANK_NUM BAD_NUM BAD_NUM_CUM BAD_PCT_CUM BAD_PCT_RANK GOOD_NUM GOOD_NUM_CUM GOOD_PCT_RANK GOOD_PCT_CUM CUM_GB_KS KS
1 5.0 12301 119 119 0.60 0.97 12182 12182 99.03 5.39 4.79
2 10.0 12300 187 306 1.54 1.52 12113 24295 98.48 10.74 9.20
3 15.0 12301 221 527 2.65 1.80 12080 36375 98.20 16.08 13.43
4 20.0 12300 278 805 4.05 2.26 12022 48397 97.74 21.40 17.35
5 25.0 12300 310 1115 5.61 2.52 11990 60387 97.48 26.70 21.09
6 30.0 12301 403 1518 7.64 3.28 11898 72285 96.72 31.96 24.32
7 35.0 12300 417 1935 9.74 3.39 11883 84168 96.61 37.22 27.48
8 40.0 12300 516 2451 12.34 4.20 11784 95952 95.80 42.43 30.09
9 45.0 12301 555 3006 15.14 4.51 11746 107698 95.49 47.62 32.48
10 50.0 12300 607 3613 18.19 4.93 11693 119391 95.07 52.79 34.60
11 55.0 12300 687 4300 21.65 5.59 11613 131004 94.41 57.93 36.28
12 60.0 12301 818 5118 25.77 6.65 11483 142487 93.35 63.01 37.24
13 65.0 12300 906 6024 30.33 7.37 11394 153881 92.63 68.04 37.71 KS
14 70.0 12300 1036 7060 35.55 8.42 11264 165145 91.58 73.03 37.48
15 75.0 12301 1224 8284 41.71 9.95 11077 176222 90.05 77.92 36.21
16 80.0 12300 1417 9701 48.85 11.52 10883 187105 88.48 82.74 33.89
17 85.0 12300 1630 11331 57.05 13.25 10670 197775 86.75 87.45 30.40
18 90.0 12301 2036 13367 67.31 16.55 10265 208040 83.45 91.99 24.68
19 95.0 12300 2546 15913 80.13 20.70 9754 217794 79.30 96.31 16.18
20 100.0 12301 3947 19860 100.00 32.09 8354 226148 67.91 100.00 0.00
TESTING MODEL DATA:
RANK_PCT RANK_NUM BAD_NUM BAD_NUM_CUM BAD_PCT_CUM BAD_PCT_RANK GOOD_NUM GOOD_NUM_CUM GOOD_PCT_RANK GOOD_PCT_CUM CUM_GB_KS KS
1 5.0 3076 49 49 0.99 1.59 3027 3027 98.41 5.35 4.36
2 10.0 3075 55 104 2.09 1.79 3020 6047 98.21 10.70 8.61
3 15.0 3075 56 160 3.22 1.82 3019 9066 98.18 16.04 12.82
4 20.0 3075 84 244 4.91 2.73 2991 12057 97.27 21.33 16.42
5 25.0 3075 81 325 6.55 2.63 2994 15051 97.37 26.62 20.07
6 30.0 3075 103 428 8.62 3.35 2972 18023 96.65 31.88 23.26
7 35.0 3075 105 533 10.74 3.41 2970 20993 96.59 37.13 26.39
8 40.0 3075 121 654 13.17 3.93 2954 23947 96.07 42.36 29.19
9 45.0 3075 146 800 16.11 4.75 2929 26876 95.25 47.54 31.43
10 50.0 3075 161 961 19.36 5.24 2914 29790 94.76 52.69 33.33
11 55.0 3076 167 1128 22.72 5.43 2909 32699 94.57 57.84 35.12
12 60.0 3075 198 1326 26.71 6.44 2877 35576 93.56 62.92 36.21
13 65.0 3075 208 1534 30.90 6.76 2867 38443 93.24 67.99 37.09 KS
14 70.0 3075 272 1806 36.37 8.85 2803 41246 91.15 72.95 36.58
15 75.0 3075 326 2132 42.94 10.60 2749 43995 89.40 77.81 34.87
16 80.0 3075 360 2492 50.19 11.71 2715 46710 88.29 82.62 32.43
17 85.0 3075 417 2909 58.59 13.56 2658 49368 86.44 87.32 28.73
18 90.0 3075 498 3407 68.62 16.20 2577 51945 83.80 91.88 23.26
19 95.0 3075 641 4048 81.53 20.85 2434 54379 79.15 96.18 14.65
20 100.0 3076 917 4965 100.00 29.81 2159 56538 70.19 100.00 0.00
TRAINING MODEL DATA:
RANK_PCT RANK_NUM BAD_NUM BAD_NUM_CUM BAD_PCT_CUM BAD_PCT_RANK GOOD_NUM GOOD_NUM_CUM GOOD_PCT_RANK GOOD_PCT_CUM CUM_GB_KS KS
1 5.0 12301 75 75 0.38 0.61 12226 12226 99.39 5.41 5.03
2 10.0 12300 95 170 0.86 0.77 12205 24431 99.23 10.80 9.94
3 15.0 12301 177 347 1.75 1.44 12124 36555 98.56 16.16 14.41
4 20.0 12300 207 554 2.79 1.68 12093 48648 98.32 21.51 18.72
5 25.0 12300 221 775 3.90 1.80 12079 60727 98.20 26.85 22.95
6 30.0 12301 286 1061 5.34 2.33 12015 72742 97.67 32.17 26.83
7 35.0 12300 347 1408 7.09 2.82 11953 84695 97.18 37.45 30.36
8 40.0 12300 361 1769 8.91 2.93 11939 96634 97.07 42.73 33.82
9 45.0 12301 463 2232 11.24 3.76 11838 108472 96.24 47.97 36.73
10 50.0 12300 513 2745 13.82 4.17 11787 120259 95.83 53.18 39.36
11 55.0 12300 604 3349 16.86 4.91 11696 131955 95.09 58.35 41.49
12 60.0 12301 707 4056 20.42 5.75 11594 143549 94.25 63.48 43.06
13 65.0 12300 900 4956 24.95 7.32 11400 154949 92.68 68.52 43.57 KS
14 70.0 12300 1008 5964 30.03 8.20 11292 166241 91.80 73.51 43.48
15 75.0 12301 1144 7108 35.79 9.30 11157 177398 90.70 78.44 42.65
16 80.0 12300 1357 8465 42.62 11.03 10943 188341 88.97 83.28 40.66
17 85.0 12300 1709 10174 51.23 13.89 10591 198932 86.11 87.97 36.74
18 90.0 12301 2132 12306 61.96 17.33 10169 209101 82.67 92.46 30.50
19 95.0 12300 2792 15098 76.02 22.70 9508 218609 77.30 96.67 20.65
20 100.0 12301 4762 19860 100.00 38.71 7539 226148 61.29 100.00 0.00
TESTING MODEL DATA:
RANK_PCT RANK_NUM BAD_NUM BAD_NUM_CUM BAD_PCT_CUM BAD_PCT_RANK GOOD_NUM GOOD_NUM_CUM GOOD_PCT_RANK GOOD_PCT_CUM CUM_GB_KS KS
1 5.0 3076 34 34 0.68 1.11 3042 3042 98.89 5.38 4.70
2 10.0 3075 56 90 1.81 1.82 3019 6061 98.18 10.72 8.91
3 15.0 3075 57 147 2.96 1.85 3018 9079 98.15 16.06 13.10
4 20.0 3075 64 211 4.25 2.08 3011 12090 97.92 21.38 17.13
5 25.0 3075 74 285 5.74 2.41 3001 15091 97.59 26.69 20.95
6 30.0 3075 83 368 7.41 2.70 2992 18083 97.30 31.98 24.57
7 35.0 3075 100 468 9.43 3.25 2975 21058 96.75 37.25 27.82
8 40.0 3075 116 584 11.76 3.77 2959 24017 96.23 42.48 30.72
9 45.0 3075 117 701 14.12 3.80 2958 26975 96.20 47.71 33.59
10 50.0 3075 163 864 17.40 5.30 2912 29887 94.70 52.86 35.46
11 55.0 3076 191 1055 21.25 6.21 2885 32772 93.79 57.96 36.71
12 60.0 3075 178 1233 24.83 5.79 2897 35669 94.21 63.09 38.26
13 65.0 3075 211 1444 29.08 6.86 2864 38533 93.14 68.15 39.07
14 70.0 3075 239 1683 33.90 7.77 2836 41369 92.23 73.17 39.27 KS
15 75.0 3075 297 1980 39.88 9.66 2778 44147 90.34 78.08 38.20
16 80.0 3075 358 2338 47.09 11.64 2717 46864 88.36 82.89 35.80
17 85.0 3075 410 2748 55.35 13.33 2665 49529 86.67 87.60 32.25
18 90.0 3075 516 3264 65.74 16.78 2559 52088 83.22 92.13 26.39
19 95.0 3075 669 3933 79.21 21.76 2406 54494 78.24 96.38 17.17
20 100.0 3076 1032 4965 100.00 33.55 2044 56538 66.45 100.00 0.00

Review Output

index 10_PER 20_PER 30_PER TRAIN_TEST MODEL
0 MODEL_TRAIN 32.7 51.2 64.4 Train NN-Model
1 MODEL_TEST 31.4 49.8 63.6 Test NN-Model
2 BK_BENCH_TRAIN 38.0 57.4 70.0 Train LGBM-Model
3 BK_BENCH_TEST 34.3 52.9 66.1 Test LGBM-Model
========================================
         METRIC COMPARISONS
- - - - - - - - - - - - - 
Auc NN Model Train        :    75.36%
Auc NN Model Test         :    74.39%
Auc NN Model Var          :    0.97%
Auc LGBM Model Train      :    79.20%
Auc LGBM Model Test       :    76.15%
Auc LGBM Model Var        :    3.05%
- - - - - - - - - - - - - 
Gini NN Model Train       :    50.71%
Gini NN Model Test        :    48.77%
Gini NN Model Var         :    1.94%
Gini LGBM Model Train     :    58.40%
Gini LGBM Model Test      :    52.29%
Gini LGBM Model Var       :    6.11%
- - - - - - - - - - - - - 
KS NN Model Train         :    37.84%
KS NN Model Test          :    37.43%
KS NN Model Var           :    0.41%
KS LGBM Model Train       :    43.79%
KS LGBM Model Test        :    39.51%
KS LGBM Model Var         :    4.28%
- - - - - - - - - - - - - 
========================================
MODEL TRAIN_TEST AUC GINI KS
0 NN-Model Train 75.36 50.71 37.84
1 NN-Model Test 74.39 48.77 37.43
2 NN-Model Variance 0.97 1.94 0.41
3 LGBM-Model Train 79.20 58.40 43.79
4 LGBM-Model Test 76.15 52.29 39.51
5 LGBM-Model Variance 3.05 6.11 4.28

Plots and Visualizations

Shapley Feature Importance

0.35.0

The features below are those we used in the second iterations of the models.

['EXT_SOURCE_2', 'inst_AMT_PAYMENT', 'SK_DPD_DEF', 'DAYS_EMPLOYED', 'SK_ID_PREV_y', 'REGION_RATING_CLIENT_W_CITY', 'CNT_INSTALMENT_FUTURE', 'EXT_SOURCE_1', 'DAYS_BIRTH', 'NAME_FAMILY_STATUS', 'AMT_DOWN_PAYMENT', 'NAME_EDUCATION_TYPE', 'EXT_SOURCE_3', 'DAYS_ID_PUBLISH', 'CODE_GENDER', 'SK_ID_PREV_x', 'SK_DPD', 'AMT_ANNUITY_x', 'AMT_CREDIT_x', 'CNT_PAYMENT', 'AMT_CREDIT_SUM_DEBT', 'AMT_GOODS_PRICE_x', 'NAME_INCOME_TYPE', 'NFLAG_INSURED_ON_APPROVAL', 'cc_bal_AMT_CREDIT_LIMIT_ACTUAL']
['EXT_SOURCE_2', 'inst_AMT_PAYMENT', 'SK_DPD_DEF']

Feature Importance

Text(0.5, 0, 'Shap Importance')