Different parameters like gini, accuracy, AUC, specificity and sensitivity are used to evaluate the performance of the model as mentioned in Section 4

Different parameters like gini, accuracy, AUC, specificity and sensitivity are used to evaluate the performance of the model as mentioned in Section 4.1. epitopes of 20\mers to prepare SVM with 400 features and their model achieved accuracy of 71.09%. BCPred used SVM and a string kernel [11] to predict linear epitopes. It scored AUC (area under the curve) value 0.758. BEST [12] used dataset of 20\mers epitopes to train SVM model for prediction and achieved AUC value 0.81 and 0.85. The SVM model [13] predicted antigenic epitopes by using tri\peptide similarity and propensity of amino acid. It achieved AUC value 0.702. Huang [14] used random forest model to predict the linear B\cell epitopes and scored accuracy of 78.31%. Lian Yao [15] utilised a sequence\based linear B\cell epitope predictor which used deep maxout network and dropout training approaches. To Rabbit Polyclonal to MT-ND5 minimise the training time of the classifier, graphics processing unit was used. It achieved accuracy of 68.33% with AUC 0.743. For linear B\cell epitope prediction, Weike Shen [16] had proposed APCpred method, which used amino acid anchoring pair composition (APC). The SVM model of 20\mers epitopes achieved accuracy of 68.43%. Biologists recognise B\cell epitopes to generate peptide\based vaccines, epitope\based antibodies and diagnostic tools. Without computer interference, biologists identify B\cell epitopes by doing experiments in the wet labs. While doing experiments, they have to test all the peptides individually to get B\cell epitope. This makes their task tedious in terms of efforts, cost and time. To make biologist’s task easy, an accurate statistical model is required which can predict whether a peptide is an epitope or a non\epitope. Therefore, machine learning techniques are used to generate predictions which reduce the human efforts, time, cost and wet lab experiments. Machine learning technique is beneficial because it facilitates the computer to understand the hidden patterns within the dataset and produces predictions around the unknown data without human interference. Therefore, with the help of machine learning techniques only those samples which are filtered by these techniques are used in the wet labs for further analysis like in experiments, peptide\based vaccines, epitope\based antibodies and diagnostic tools. In the present study, the large number of peptides are given to the machine learning models and they predict whether that peptide is an epitope or a non\epitope. The filtered peptides which are epitopes according to the models are used for further analysis rather than using all the peptides. This makes biologist’s job easy by reducing time, cost and efforts for identifying B\cell epitopes. Inspired from the performance of machine learning models and need to find a reliable model which can predict antigenic epitopes and reduces the expense around the experimental testing of epitopes, a hybrid method has been proposed by using stacked generalisation ensemble technique. To train the models, physicochemical properties Impurity C of Alfacalcidol of amino acids are used which in turn classify the sequential B\cell epitopes as described in Section 3. From literature survey, some shortcomings of B\cell epitope prediction methods have been found which includes feature selection phase [9, 10, 11, 13], fixed length of amino acid sequences Impurity C of Alfacalcidol [9, 10, 12, 13], small dataset and basic models (random forest, SVM, neural network). Feature selection phase is essential because it reduces complexity of dataset and enhances the performance of model. Model trained with fixed length of epitopes is used to predict fixed length of epitopes. Nowadays, flexible model is required which can predict any length of epitope. The effectiveness of model is dependent on the size of the training dataset. The datasets used in existing methods [9, 10, 11, 12, 13, 14, 16] contain 700, 2479, 701, 4925, 2479, 727, 727, 1573 antigenic epitopes, respectively. In order to overcome the above\stated flaws, the contributions of the proposed ensemble model are stated below: The proposed ensemble model is usually a combination of six models which includes blackboost [17], regularised random forest [18], SVM [19, 20], random forest [21, 22], GBM (generalised boosted regression modelling) [23] and avNNet [24, 25]. The proposed ensemble model has been explained in Section 3.2. It is different from existing sequential B\cell prediction techniques because such Impurity C of Alfacalcidol techniques are based on single model (mostly used models RF, SVM and NN), which may produce false predictions..