Implementation of Xception and EfficientNetB3 for COVID-19 Detection on Chest X-Ray Image via Transfer Learning

COVID-19 is a highly contagious infectious disease caused by the SARS-CoV-2 virus that can cause respiratory issues. The utilization of X-ray imaging has the potential to serve as an alternative means of detecting COVID-19 by offering insights into the condition of the lungs. Rapid and automated analysis of medical images and patterns can be achieved through deep learning techniques. In this study, we propose methods for the automatic classification of COVID-19 from Chest X-Ray images using CNN with transfer learning techniques, namely Xception and EfficientNetB3 architectures, as well as an ensemble of both architectures working in parallel. Additionally, we use Grad-CAM to visualize the regions of the X-ray image that are most important for the classification decision. The classification of COVID-19 is carried out for four types of classes: COVID-19, normal, bacterial pneumonia, and viral pneumonia. The proposed classifier models achieve an overall accuracy of 94.44% for the Xception classifier, 95.28% for the EfficientNetB3 classifier, and 94.44% for the parallel classifier. The accuracy value is higher than the other comparison classifiers accuracy values. Overall, the proposed classifiers can be recommended as tools to assist radiologists and clinical practitioners in diagnosing and following up with COVID-19 cases.


Introduction
Coronavirus disease 2019 (COVID-19) cases were first reported in Wuhan, China, in December 2019, and on March 11, 2020, World Health Organization (WHO) declared it a pandemic [1].With the increasing number of COVID-19 infections, there is a need for immediate and less invasive diagnostic tools.Commonly used tests for detection include antigen, immunoenzymatic serological, and reverse-transcription polymerase chain reaction (RT-PCR) molecular tests [2].Another alternative for the diagnosis and validation of COVID-19 is the use of non-invasive radiological imaging, such as computed tomography (CT) and chest X-ray (CXR) [3].
CXR is used because it is fast, cost-effective, and can detect pulmonary and alveolar interstitial opacities in patients with COVID-19 symptoms [1,2].The portable CXR can also be used for easy, timely access in remote areas where access to healthcare is limited [4].But some lung abnormalities cannot be detected using CXR [2].To overcome this limitation, deep learning (DL) is used for medical image analysis; it can automatically analyze, identify, and classify patterns in medical images.One of the most widely used DL models for classifying and detecting medical images is the convolutional neural network (CNN) that has been used to detect various diseases and lesions on the body automatically in recent years [3].
Several studies have been conducted to classify COVID-19 using CXR images.In one study [5], a CNN model called CoroNet with a transfer learning architecture Xception achieved an overall accuracy of 89.6% for four classes: COVID-19, normal, bacterial pneumonia, and viral pneumonia.Another study [6] used a combination of two CNN architectures, Xception and ResNet50V2, and achieved an accuracy of 91.4% for the same four classes.In a third study [7], the transfer learning architecture of EfficientNetB4 was used, and an overall accuracy of 96.70% for three classes, COVID-19, normal, and viral pneumonia, was achieved.
In this study, we use a deep learning approach with the transfer learning CNN model, namely Xception and EfficientNetB3 architectures, and the ensemble of the two architectures arranged in parallel for the classification of COVID-19 using CXR images.Xception and EfficientNetB3 architectures are used because they have better accuracy than other architectures after researching several possible CNN architectures that can be used.The classification of COVID-19 is carried out for four types of classes: COVID-19, normal, bacterial pneumonia, and viral pneumonia.

Dataset
The COVID-19 dataset comes from the public data "Curated Dataset for COVID-19 Posterior-Anterior Chest Radiography Images (X-Rays)" Version 3 [8] with a Postero-Anterior view of 4000 CXR images.The amount of data per class has the same amount so that the data used is balanced.From the total data of 4000 CXR images, then the data is divided by a ratio of 90:10 for training-validation data (3600 CXR images) and test data (400 CXR images).A summary of the data used is presented in TABLE I.

Data Splitting
Data splitting is done by splitting the training-validation data into several types of data splitting: data ratios of 70:30, 80:20, 90:10, and 5-fold cross-validation.Split data 70:30, 80:20, and 90:10 were performed by adding the validation of the split parameter to the data augmentation settings using the ImageDataGenerator function from the Keras library.5-fold cross-validation was performed using the StratifiedKFold function from the sklearn library so that the data will automatically be divided into five-fold.

Pre-processing
Pre-processing is the transformation of raw data before it is entered or used in the model.The process is done by resizing the data size to 224 × 224 × 3 pixels to ensure consistent dimensions across all samples.This resizing step helps in achieving uniformity in data representation.The data augmentation is done to overcome overfitting and increase the accuracy of the proposed model.This technique introduces synthetic variations to the dataset, increasing its diversity and reducing the risk of the model memorizing specific patterns.Data augmentation operations include rescale, rotation_range, height_shift_range, width_shift_range, shear_range, zoom_range, horizontal_flip, vertical_flip, and fill_mode.The data is then randomized to generalize the model and reduce overfitting [9,10].
For the training process, this study used 60 epochs and 16 batch sizes, which were determined after experimenting on several different epochs and batch sizes combinations.To prevent overfitting and ensure effective training, an early stopping callback is employed [11].Loss function categorical cross-entropy for multi-class classification is used in accordance with the classification carried out in the study [9].The optimizer used is the Adam optimizer, with a learning rate of 0.0001 [12].
By following these pre-processing steps and training configurations, the model is better equipped to handle the classification task, leading to improved accuracy and generalization capabilities.

Architectural Model Selection
Experiments were conducted to determine the best CNN architecture for the proposed research model.Experiments were carried out in Kaggle kernels [13] with the proposed layer modification and 60 epochs on the CXR dataset that will be used for this study.Data is divided by a ratio of 80:20 for training and validation data.The experimental results obtained the highest average validation accuracy value of all test architectures on the Xception and EfficientNetB3 architectures at 94.31%.Based on this experiment, we decided to use CNN transfer learning using the Xception and Ef-ficientNetB3 architectures and the ensemble transfer learning of the two architectures arranged in parallel.

Model Architecture and Development
Three classifier models are developed based on the Xception, EfficientNetB3, and an ensemble of the two architectures arranged in parallel.The base model has been pre-trained on ImageNet [14] for Xception and with noisy-student weights [15] for EfficientNetB3.Xception or Extreme Inception is a deep separable convolution layer with residual connections with 36 convolution layers organized into 14 modules.The residual connections are "skip connections" that allow the gradient to flow through the network directly, avoiding the problem of missing gradients [16].EfficientNet is a CNN architecture developed using a scaling method that uniformly scales all width, depth, and resolution dimensions using compound coefficients.The compound coefficient method is based on the idea of balancing the dimensions of width, depth, and resolution with constant-ratio scaling [17].
To enhance the performance of our models, we introduced several innovative modifications to the base architectures, taking inspiration from the work of G. Marques et al. [7].Their research showcased the effectiveness of incorporating global average pooling, dense layers, and dropout layers after the transfer learning model.Building upon their findings, we tailored our models by incorporating these modifications alongside additional enhancements.Specifically, we integrated global average pooling, fully connected layer (dense layer) with the rectified linear unit (ReLU) activation function, dropout layer, batch normalization, and output layer (dense layer) utilizing the softmax activation function for multi-class classification.The model architecture, illustrating these modifications, is depicted in Fig. 1.
Global average pooling was chosen as it can summarize spatial information without introducing additional parameters, which can help prevent overfitting [18].The dense layer with 256 nodes was selected as a hidden layer with the ReLU activation function after experimenting with different numbers of nodes.It was found that using 256 nodes resulted in better accuracy compared to other node variations.The ReLU activation function is commonly used in CNN because it has a lower computational load and is faster than sigmoid and tanh [9].Dropout and batch normalization are also utilized for regularization to avoid overfitting.A dropout rate of 0.5 is used as it provides the highest regularization rate [19].The output layer with softmax activation function is used with the number of nodes depending on each class to be classified.The design of this layer modification is determined after selecting several different layer arrangements and selecting the layer combined with the highest accuracy among other layer modification designs.
These modifications represent a novel contribution to the field by combining established techniques with innovative adaptations and optimizations.The CNN model architecture we developed introduces a new design that sets it apart from existing state-of-the-art models.Through extensive experimentation, we fine-tuned the layer configuration to achieve superior accuracy in classifying COVID-19, pneumonia virus, pneumonia bacterial, and normal cases.These enhancements showcase a distinct layer design, setting our models apart from other state-of-the-art models.
The design and training of the proposed classifier model will be carried out on the Kaggle kernels [13] using a GPU NVIDIA TESLA P100 VGA, 13GB RAM, and the Python programming language.

Evaluation Metrics
The model is evaluated by calculating the accuracy, precision, recall, and F1-score on validation and test data.The evaluation results were compared using a confusion matrix and classification report to calculate the accuracy, precision, recall, and F1-score level.Gradient-weighted Class Activation Mapping (Grad-CAM) [20] is also used to provide color visualization of lung areas infected by COVID-19 and pneumonia using test data.
The values of accuracy, precision, recall, and f1-score are calculated using the following equation: The true positive (TP) and true negative (TN) represent the correct classification results, while the false positive (FP) and false negative (FN) represent the incorrect classification results.TP is the number of data of a class that is classified correctly.FP is the number of misclassified data from a class, and FN is the number of data from a class detected as another.TN is the amount of data that does not belong to a class and is not classified as belonging to that class [21,22].

Result and Discussion
The COVID-19 classification experiments were carried out using three proposed models: Xception, EfficientNetB3, and parallel classifier.Different data splitting techniques, including 70:30, 80:20, and 90:10 ratios for training and validation data, as well as 5-fold cross-validation, were used in the experiments.The evaluation and analysis focused on the positive COVID-19 class, as the study aimed to detect positive cases of COVID-19 from the non-COVID-19 classes, including normal, bacterial pneumonia, and viral pneumonia.

Proposed Xception Classififer
The validation results of the COVID-19 classification on the proposed Xception classifier are presented in TABLE II.which details the value of precision, recall, F1score, and overall accuracy for all classes.Based on TABLE II., the overall accuracy for all classes with the best results is generated in the split data with a ratio of 90:10, which is 94.44%.Then the precision value, recall, and F1-score for the COVID-19 class resulted in a matter of 100% in the split data ratio of 90:10 and fold 1.As for other data splitting, the COVID-19 class still produced good scores, ranging from 97.81% to 100% for precision, recall, and F1-scores.The evaluation value that does not have a value of 100% is because there are FP and FN in the model validation results.
The model also performed well in identifying non-COVID-19 classes, although not as well as COVID-19.In the 90:10 split data, the normal class had a high precision, recall, and F1-score, achieving a recall score of 100%, a precision of 95.74%, and an F1-score of 97.83%.Similarly, the bacterial pneumonia class achieved a precision value of 96.20%, recall of 84.44%, and F1-score of 89.94%, and the viral pneumonia class achieved a precision of 86.60%, recall of 93.33%, and F1-score of 89.84% in the same split data ratio.Overall, the proposed Xception classifier model effectively identifies COVID-19 cases, and the 90:10 split data ratio is the best for achieving high accuracy and good precision, recall, and F1-score values in all classes.

Proposed EfficientNetB3 Classififer
The validation results of the COVID-19 classification on the proposed EfficientNetB3 classifier are presented in TABLE III. , which details the value of precision, recall, F1-score, and overall accuracy or accuracy for all classes.Based on TABLE III. , the accuracy values for all classes with the best results are generated in the split data with a ratio of 90:10, which is 95.28%.Then the precision value, recall, and F1-score for the COVID-19 class resulted in a value of 100% in the split data ratio of 90:10.As for other data splitting, the COVID-19 class still produced good scores, ranging from 97.78% to 100% for precision, recall, and F1 scores.The evaluation value that does not have a value of 100% is because there are FP and FN in the model validation results.
The model also performed well in identifying non-COVID-19 classes, although not as well as COVID-19.In the 90:10 split data, the normal class had a high precision, recall, and F1-score, achieving scores of 100% for all.The bacterial pneumonia class achieved a precision value of 91.95%, recall of 88,89%, and F1-score of 90,40%, and the viral pneumonia class achieved a precision of 89,25%, recall of 92.22%, and F1score of 90,71% in the same split data ratio.Overall, the proposed EfficientNetB3 classifier model effectively identifies COVID-19 cases, and the 90:10 split data ratio is the best for achieving high accuracy and good precision, recall, and F1-score values in all classes.

Proposed Parallel Classififer
The validation results of the COVID-19 classification on the proposed parallel classifier are presented in TABLE IV. , which details the value of precision, recall, F1-score, and overall accuracy or accuracy for all classes.Based on TABLE IV. , the accuracy values for all classes with the best results are generated in the split data with a ratio of 90:10, which is 94.44%.Then the precision value, recall, and F1-score for the COVID-19 class resulted in a value of 100% in the split data ratio of 90:10 and fold 4. As for other data splitting, the COVID-19 class still produced good scores, ranging from 98,33% to 100% for precision, recall, and F1 scores.The evaluation value that does not have a value of 100% is because there are FP and FN in the model validation results.
The model also performed well in identifying non-COVID-19 classes, although not as well as COVID-19.In the 90:10 split data, the normal class had a high precision, recall, and F1-score, achieving a recall score of 100%, a precision of 98,90%, and an F1-score of 99,45%.The bacterial pneumonia class achieved a precision value of 90.70%, recall of 86.67%, and F1-score of 88.64%, and the viral pneumonia class achieved a precision of 88.17%, recall of 91.11%, and F1-score of 89.62% in the same split data ratio.Overall, the proposed parallel classifier model effectively identifies COVID-19 cases, and the 90:10 split data ratio is the best for achieving high accuracy and good precision, recall, and F1-score values in all classes.

Testing in Test Data
Based on TABLE VI. , the proposed classifier models have good accuracy in testing using test data; the Xception classifier has an accuracy of 93.50% for all classes with a  Tests on unit data are also carried out using the proposed EfficientNetB3 classifier.The test was carried out using four test data for each class.The following evaluation used the Grad-CAM visualization method to provide a visual explanation regarding the model's decision to detect COVID-19 on CXR images.This is done as an evaluation so the doctor can decide whether to trust the model.This visualization also determines the image's important features, so the model classifies an image into a particular class.The resulting visualization is a heatmap with different color intensities depending on the importance of features in the image.

Conclusion
In summary, this study developed three classifiers, namely Xception, EfficientNetB3, and parallel transfer learning models, for automatic COVID-19 classification from chest X-ray images.These proposed classifiers achieved higher accuracy compared to state-of-the-art models.With a data splitting ratio of 90:10, the Xception classifier achieved an overall accuracy of 94.44%, the EfficientNetB3 classifier achieved 95.28%, and the parallel classifier achieved 94.44%.The COVID-19 class exhibited perfect precision, recall, and F1-score, with a score of 100%.The proposed classifiers can serve as valuable tools to assist radiologists and clinical practitioners in diagnosing and following up on COVID-19 cases.
In future work, this research can be continued to improve accuracy in the classification of COVID-19, building upon the findings of this study.Additionally, further investigations can be conducted to develop methods for classifying the severity level of COVID-19.This expanded approach will enable more comprehensive support in disease management and provide valuable insights for healthcare professionals.

Figure 1 .
Figure 1.Overview of the proposed model architectures

3. 4
Comparison with Other Classifiers Comparisons were made to evaluate the performance of the proposed classifier model against state-of-the-art classifiers.It was carried out using the same dataset, number of classes, epochs, batch size, and a data distribution or split ratio of 90:10, as used in the proposed classifier.The results of the classifier comparison are presented in TABLE V. Based on TABLE V. , the proposed classifier outperformed the other four classifiers with an accuracy of 94.44% for the Xception and parallel classifier and 95.28% for the EfficientNetB3 classifier.The proposed classifier's precision, recall, and F1-score values were also higher than all comparison classifiers, with a value of 100% for the COVID-19 class.In addition to classifying COVID-19 classes effectively, the proposed classifier also performed well in classifying non-COVID-19 classes.The high accuracy results of the proposed classifier, especially EfficientNetB3, were due to parameter settings and layer modifications that outperformed other classifiers.Overall, the proposed classifier model outperformed the comparison classifiers regarding higher accuracy, precision, recall, and F1 scores for the COVID-19 class.The proposed EfficientNetB3 classifier had the highest F1-score for all classes compared to other classifiers and had the best recall and precision values for almost all classes.

Table 5 .
Evaluation Comparison of The Proposed Classifiers with Other Classifier for COVID-19 Classification precision value of 100%, recall of 98.00%, and an F1-score of 98.99 for COVID-19 positive class.The EfficientNetB3 classifier has an accuracy of 95.50% for all classes, with a precision value of 100.00%, a recall of 99.00%, and an F1-score of 99.50% for the COVID-19 positive class.Furthermore, the parallel classifier has accuracy for all classes of 94.25%, with the same precision, recall, and F1-score as the EfficientNetB3 classifier for the COVID-19 positive class.

Fig 2 .
shows the results of the Grad-CAM visualization using the proposed Xception classifier in classifying COVID-19 positive classes.The image shows a heatmap from the CXR COVID-19 image, highlighting an important area of the

Figure 2 .
Figure 2. Grad-CAM shows the part of the input image that triggers a positive prediction of COVID-19

Table 2 .
The Results of The COVID-19 Classification Evaluation Parameters on The Xception Classifier per Class

Table 3 .
The Result of The COVID-19 Classification Evaluation Parameters on The EfficientNetB3 Classifier per Class

Table 4 .
Results of COVID-19 Classification Evaluation on Parallel Classifier per Class

Table 6 .
Evaluation Comparison using Test Data on The Best Model of Each Proposed Classifier per Class