Skip to main content
placeholder image

Leveraging SMOTE in A Two-Layer Model for Prediction of Protein-Protein Interactions

Conference Paper


Download full-text (Open Access)

Abstract


  • The research of the mechanisms of infectious

    diseases between host and pathogens remains a hot topic. It

    takes stock of the interactions data between host and pathogens,

    including proteins and genomes, to facilitate the discoveries and

    prediction of underlying mechanisms. However, the incomplete

    protein-protein interactions data impediment the advances in

    this exploration and solicit the wet-lab experiments to examine

    and verify the latent interactions. Although there have been

    numerous studies trying to leverage the computational models,

    especially machine learning models, the performances of these

    models were not good enough to produce high-fidelity

    candidates of interactions data due to the nature of the proteinprotein

    interactions data. In this paper, we propose a two-layer

    model for prediction of host-pathogen protein-protein

    interactions tackling the challenges affiliated to the feature

    representation algorithms and the imbalanced data. The twolayer

    model consists of two essential modules, which are

    XGBoost to reduce the imbalanced ratio of the data and SVM to

    improve the performance. SMOTE technology is incorporated

    as a key component in our model to alleviate the bias of

    imbalanced ratio. In this study, we have carefully collected

    proteins interactions data from public databases and built a

    dataset following the protocol with consensus of literature. A

    variety of models, including traditional models, models in major

    literature and our model, are verified on the datasets. Results

    demonstrate that our model significantly improve the

    performance comparing with the other state-of-the-art models.

Authors


  •   Chen, Huaming (external author)
  •   Wang, Lei
  •   Chi, Chi-Hung (external author)
  •   Shen, Jun

Publication Date


  • 2019

Citation


  • Chen, H., Wang, L., Chi, C. & Shen, J. (2019). Leveraging SMOTE in A Two-Layer Model for Prediction of Protein-Protein Interactions. 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD) (pp. 133-138). United States: IEEE.

Scopus Eid


  • 2-s2.0-85076896764

Ro Full-text Url


  • https://ro.uow.edu.au/cgi/viewcontent.cgi?article=4273&context=eispapers1

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/3253

Start Page


  • 133

End Page


  • 138

Place Of Publication


  • United States

Abstract


  • The research of the mechanisms of infectious

    diseases between host and pathogens remains a hot topic. It

    takes stock of the interactions data between host and pathogens,

    including proteins and genomes, to facilitate the discoveries and

    prediction of underlying mechanisms. However, the incomplete

    protein-protein interactions data impediment the advances in

    this exploration and solicit the wet-lab experiments to examine

    and verify the latent interactions. Although there have been

    numerous studies trying to leverage the computational models,

    especially machine learning models, the performances of these

    models were not good enough to produce high-fidelity

    candidates of interactions data due to the nature of the proteinprotein

    interactions data. In this paper, we propose a two-layer

    model for prediction of host-pathogen protein-protein

    interactions tackling the challenges affiliated to the feature

    representation algorithms and the imbalanced data. The twolayer

    model consists of two essential modules, which are

    XGBoost to reduce the imbalanced ratio of the data and SVM to

    improve the performance. SMOTE technology is incorporated

    as a key component in our model to alleviate the bias of

    imbalanced ratio. In this study, we have carefully collected

    proteins interactions data from public databases and built a

    dataset following the protocol with consensus of literature. A

    variety of models, including traditional models, models in major

    literature and our model, are verified on the datasets. Results

    demonstrate that our model significantly improve the

    performance comparing with the other state-of-the-art models.

Authors


  •   Chen, Huaming (external author)
  •   Wang, Lei
  •   Chi, Chi-Hung (external author)
  •   Shen, Jun

Publication Date


  • 2019

Citation


  • Chen, H., Wang, L., Chi, C. & Shen, J. (2019). Leveraging SMOTE in A Two-Layer Model for Prediction of Protein-Protein Interactions. 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD) (pp. 133-138). United States: IEEE.

Scopus Eid


  • 2-s2.0-85076896764

Ro Full-text Url


  • https://ro.uow.edu.au/cgi/viewcontent.cgi?article=4273&context=eispapers1

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers1/3253

Start Page


  • 133

End Page


  • 138

Place Of Publication


  • United States