Prediction of Box Office for Bollywood Movies Using State-of-the-Art SentiDraw Lexicon for Twitter Analysis
DOI:
https://doi.org/10.17010/ijom/2021/v51/i5-7/161644Keywords:
Sentiment Lexicon
, Box Office Prediction, Sentidraw Method, Movie Reviews, Bollywood, Twitter.Paper Submission Date
, February 17, 2020, Paper Sent Back for Revision, October 17, Paper Acceptance Date, November 12, Paper Published Online, June 25, 2021.Abstract
Films are a high-risk industry. Accurate prediction of movie box-office revenues can reduce this market risk and inform the investment decisions regarding promotion of the movie closer to a film’s release or right after release. Studies have shown that chatter on social media platforms like Twitter along with certain movie-related factors can be useful in predicting success of movies. Sentiment of tweets for any movie gives important information about the consumer’s reaction and the polarity of these sentiments has been shown to have an impact on prediction of box-office revenues. This paper presented a novel Bollywood domain specific sentiment lexicon that delivered state-of-the-art performance for polarity determination of reviews. SentiDraw lexicon was built on movie reviews scraped from IMDB and calculated the sentiment orientation of these words by calculating the probability distribution of words across reviews with different star ratings. The results showed that SentiDraw lexicon delivered a superior performance compared to any other lexicon-based method. This significantly contributed in enhancing the prediction accuracy of box office for movies using textual data from Twitter for analysis. In fact, this study demonstrated an extremely parsimonious regression model that used only budget, hype factor, tweet volume, and polarity of tweets for a robust prediction of box office revenues even before the release of a movie.Downloads
Downloads
Published
How to Cite
Issue
Section
References
Abbasi, A., France, S., Zhang, Z., & Chen, H. (2011). Selecting attributes for sentiment classification using feature relation networks. IEEE Transactions on Knowledge and Data Engineering, 23(3), 447–462. https://doi.org/10.1109/tkde.2010.110
Almatarneh, S., & Gamallo, P. (2018). Automatic construction of domain-specific sentiment lexicons for polarity classification. In, F. De la Prieta et al. (eds), Trends in cyber-physical multi-agent systems. The PAAMS Collection - 15th International Conference, PAAMS 2017. Advances in Intelligent Systems and Computing (Vol. 619). Springer, Cham. https://doi.org/10.1007/978-3-319-61578-3_17
Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0 : An enhanced lexical resource for sentiment analysis and opinion mining. In, Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10) (Vol. 10, No. 2010, pp. 2200–2204). https://doi.org/10.1109/mis.2010.94
Bai, X. (2011). Predicting consumer sentiments from online text. Decision Support Systems, 50(4), 732–742. https://doi.org/10.1016/j.dss.2010.08.024
BhÄle, S., & Tongare, K. (2018). A conceptual model of helpfulness of online reviews in a blink. Indian Journal of Marketing, 48(2), 7–22. https://doi.org/10.17010/ijom/2018/v48/i2/121331
Chintagunta, P. K., Gopinath, S., & Venkataraman, S. (2010). The effects of online user reviews on movie box office performance : Accounting for sequential rollout and aggregation across local markets. Marketing Science, 29(5), 944–957. https://doi.org/10.1287/mksc.1100.0572
Dastidar, S. G., & Elliott, C. (2019). The Indian film industry in a changing international market. Journal of Cultural Economics, 44(1), 97–116. https://doi.org/10.1007/s10824-019-09351-6
Dellarocas, C., Zhang, X. (Michael), & Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales : The case of motion pictures. Journal of Interactive Marketing, 21(4), 23–45. https://doi.org/10.1002/dir.20087
Dhir, R., & Raj, A. (2018). Movie success prediction using machine learning algorithms and their comparison. 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), 385–390. https://doi.org/10.1109/icsccc.2018.8703320
Du, Y., Zhao, X., He, M., & Guo, W. (2019). A novel capsule based hybrid neural network for sentiment classification. IEEE Access, 7, 39321–39328. https://doi.org/10.1109/access.2019.2906398
EY India. (2019, January 14). The Indian film tourism industry has potential to generate US$3b by 2022 [press release]. https://www.ey.com/en_in/news/2019/01/indian-film-tourism-industry-has-potential-to-generate-usd-3-billion-by-2022
Gatti, L., Guerini, M., & Turchi, M. (2016). SentiWords : Deriving a high precision and high coverage lexicon for sentiment analysis. IEEE Transactions on Affective Computing, 7(4), 409–421. https://doi.org/10.1109/taffc.2015.2476456
Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact of product reviews : Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering, 23(10), 1498–1512. https://doi.org/10.1109/tkde.2010.188
Iqbal, F., Hashmi, J. M., Fung, B. C., Batool, R., Khattak, A. M., Aleem, S., & Hung, P. C. (2019). A hybrid framework for sentiment analysis using genetic algorithm based feature reduction. IEEE Access, 7, 14637–14652. https://doi.org/10.1109/access.2019.2892852
Jaiswal, S. R., & Sharma, D. (2017). Predicting success of Bollywood movies using machine learning techniques. In, Proceedings of the 10th Annual ACM India Compute Conference (Compute’17). Association for Computing Machinery. https://doi.org/10.1145/3140107.3140126
Jiménez - Zafra, S. M., Martin, M., Molina - González, M. D., & Urena - Lopez, L. A. (2016). Domain adaptation of polarity lexicon combining term frequency and bootstrapping. In, Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 137–146). Association for Computational Linguistics. https://doi.org/10.18653/v1/w16-0422
Khan, F. H., Qamar, U., & Bashir, S. (2015). SentiMI : Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Applied Soft Computing, 39, 140–153. https://doi.org/10.1016/j.asoc.2015.11.016
Khan, F. H., Qamar, U., & Bashir, S. (2016). Senti - CS : Building a lexical resource for sentiment analysis using subjective feature selection and normalized chi - square based feature weight generation. Expert Systems, 33(5), 489–500. https://doi.org/10.1111/exsy.12161
Khoo, C. S., & Johnkhan, S. B. (2018). Lexicon-based sentiment analysis : Comparative evaluation of six sentiment lexicons. Journal of Information Science, 44(4), 491–511. https://doi.org/10.1177/0165551517703514
Labille, K., Gauch, S., & Alfarhood, S. (2017, August). Creating domain-specific sentiment lexicons via text mining. WISDOM' 17. http://www.csce.uark.edu/~sgauch/5543/F17/notes/wisdom17.pdf
Lee, H., Han, Y., & Kim, K. (2014). Sentiment analysis on online social network using probability Model. In, AFIN 2014 : Proceedings of the Sixth International Conference on Advances in Future Internet (pp.14–19). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.671.6392&rep=rep1&type=pdf
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics : Human Language Technologies (Vol. 1, pp. 142–150). https://www.aclweb.org/anthology/P11-1015.pdf
Musto, C., Semeraro, G., & Polignano, M. (2014, December). A comparison of lexicon-based approaches for sentiment analysis of microblog posts. Information Filtering and Retrieval. In, DART@ AI* IA (pp. 59–68). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.664.7765&rep=rep1&type =pdf#page=66
Narayanaperumal, M. (2020). Deep neural networks for sentiment analysis in tweets with emoticons (Doctoral Dissertation). Nova Southeastern University. https://nsuworks.nova.edu/gscis_etd/1117
Niraj, R., & Singh, J. (2015). Impact of user-generated and professional critics reviews on Bollywood movie success. Australasian Marketing Journal, 23(3), 179–187. https://doi.org/10.1016/j.ausmj.2015.02.001
Pang, B., & Lee, L. (2004). A sentimental education : Sentiment analysis using subjectivity summarization based on minimum cuts. In, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL ’04). Association for Computational Linguistics, USA. https://doi.org/10.3115/1218955.1218990
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up ? Sentiment classification using machine learning techniques. In, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 (EMNLP ’02). Association for Computational Linguistics, USA. https://doi.org/10.3115/1118693.1118704
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count : LIWC 2001. Mahway : Lawrence Erlbaum Associates.
Prabowo, R., & Thelwall, M. (2009). Sentiment analysis : A combined approach. Journal of Informetrics, 3(2), 143–157. https://doi.org/10.1016/j.joi.2009.01.003
Reddy, A. S., Kasat, P., & Jain, A. (2012). Box - office opening prediction of movies based on hype analysis through data mining. International Journal of Computer Applications, 56(1), 1–5. https://doi.org/10.5120/8852-2794
Saif, H., Fernandez, M., He, Y., & Alani, H. (2014). SentiCircles for contextual and conceptual semantic sentiment analysis of Twitter. In, V. Presutti, C. D’Amato, F. Gandon, M. D’Aquin, S. Staab, & A. Tordai (eds), The semantic web : Trends and challenges. ESWC 2014. Lecture Notes in Computer Science (Vol. 8465). Springer, Cham. https://doi.org/10.1007/978-3-319-07443-6_7
Sharma, S. S., & Dutta, G. (2018). Polarity determination of movie reviews : A systematic literature review. International Journal of Innovative Knowledge Concepts, 6(12), 43–55.
Shaukat, Z., Zulfiqar, A. A., Xiao, C., Azeem, M., & Mahmood, T. (2020). Sentiment analysis on IMDB using lexicon and neural networks. SN Applied Sciences, 2(2), 1–10. https://doi.org/10.1007/s42452-019-1926-x
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307. https://doi.org/10.1162/coli_a_00049
Thelwall, M. (2017). The heart and soul of the web ? Sentiment strength detection in the social web with SentiStrength. In, J. Holyst (eds), Cyberemotions. Understanding complex systems. Springer, Cham. https://doi.org/10.1007/978-3-319-43639-5_7
Thomas, F. C., & Patel, N. K. (2020). Determining the effectiveness of promotion and reviews of Bollywood films from audiences : An empirical study. Indian Journal of Marketing, 50(4), 7–24. https://doi.org/10.17010/ijom/2020/v50/i4/151570
Utomo, T. S., Sarno, R., & Suhariyanto. (2018, September). Emotion label from ANEW dataset for searching best definition from WordNet. In, 2018 International Seminar on Application for Technology of Information and Communication (pp. 249–252). IEEE. https://doi.org/10.1109/isemantic.2018.8549769
Venkataraman, N., & Raman, S. (2016). Impact of user-generated content on purchase intention for fashion products : A study on women consumers in Bangalore. Indian Journal of Marketing, 46(7), 23–35. https://doi.org/10.17010/ijom/2016/v46/i7/97125