Skip to main content
placeholder image

Mining Chinese social media UGC: a big‑data framework for analyzing Douban movie reviews

Journal Article


Download full-text (Open Access)

Abstract


  • Analysis of online user-generated content is receiving attention for its wide applications

    from both academic researchers and industry stakeholders. In this pilot study, we

    address common Big Data problems of time constraints and memory costs involved

    with using standard single-machine hardware and software. A novel Big Data processing

    framework is proposed to investigate a niche subset of user-generated popular culture

    content on Douban, a well-known Chinese-language online social network. Huge

    data samples are harvested via an asynchronous scraping crawler. We also discuss how

    to manipulate heterogeneous features from raw samples to facilitate analysis of various

    film details, review comments, and user profiles on Douban with specific regard to a

    wave of South Korean films (2003–2014), which have increased in popularity among

    Chinese film fans. In addition, an improved Apriori algorithm based on MapReduce

    is proposed for content-mining functions. An exploratory simulation of results demonstrates

    the flexibility and applicability of the proposed framework for extracting

    relevant information from complex social media data, knowledge which can in turn be

    extended beyond this niche dataset and used to inform producers and distributors of

    films, television shows, and other digital media content.

Publication Date


  • 2016

Citation


  • Yang, J. & Yecies, B. "Mining Chinese social media UGC: a big‑data framework for analyzing Douban movie reviews." Journal of Big Data 3 .3 (2016): 1-23.

Scopus Eid


  • 2-s2.0-85013937930

Ro Full-text Url


  • http://ro.uow.edu.au/cgi/viewcontent.cgi?article=3749&context=lhapapers

Ro Metadata Url


  • http://ro.uow.edu.au/lhapapers/2738

Has Global Citation Frequency


Number Of Pages


  • 22

Start Page


  • 1

End Page


  • 23

Volume


  • 3

Issue


  • 3

Place Of Publication


  • USA

Abstract


  • Analysis of online user-generated content is receiving attention for its wide applications

    from both academic researchers and industry stakeholders. In this pilot study, we

    address common Big Data problems of time constraints and memory costs involved

    with using standard single-machine hardware and software. A novel Big Data processing

    framework is proposed to investigate a niche subset of user-generated popular culture

    content on Douban, a well-known Chinese-language online social network. Huge

    data samples are harvested via an asynchronous scraping crawler. We also discuss how

    to manipulate heterogeneous features from raw samples to facilitate analysis of various

    film details, review comments, and user profiles on Douban with specific regard to a

    wave of South Korean films (2003–2014), which have increased in popularity among

    Chinese film fans. In addition, an improved Apriori algorithm based on MapReduce

    is proposed for content-mining functions. An exploratory simulation of results demonstrates

    the flexibility and applicability of the proposed framework for extracting

    relevant information from complex social media data, knowledge which can in turn be

    extended beyond this niche dataset and used to inform producers and distributors of

    films, television shows, and other digital media content.

Publication Date


  • 2016

Citation


  • Yang, J. & Yecies, B. "Mining Chinese social media UGC: a big‑data framework for analyzing Douban movie reviews." Journal of Big Data 3 .3 (2016): 1-23.

Scopus Eid


  • 2-s2.0-85013937930

Ro Full-text Url


  • http://ro.uow.edu.au/cgi/viewcontent.cgi?article=3749&context=lhapapers

Ro Metadata Url


  • http://ro.uow.edu.au/lhapapers/2738

Has Global Citation Frequency


Number Of Pages


  • 22

Start Page


  • 1

End Page


  • 23

Volume


  • 3

Issue


  • 3

Place Of Publication


  • USA