Prediction of Box Office for Bollywood Movies Using State-of-the-Art SentiDraw Lexicon for Twitter Analysis

Authors

  •   Shashank Shekhar Sharma Research Scholar, Indian Institute of Foreign Trade, IIFT Bhawan, B-21, NRPC Colony, Block B, Qutab Institutional Area, New Delhi - 110 016
  •   Gautam Dutta Professor, Indian Institute of Foreign Trade, IIFT Bhawan, B-21, NRPC Colony, Block B, Qutab Institutional Area, New Delhi - 110 016

DOI:

https://doi.org/10.17010/ijom/2021/v51/i5-7/161644

Keywords:

Sentiment Lexicon

, Box Office Prediction, Sentidraw Method, Movie Reviews, Bollywood, Twitter.

Paper Submission Date

, February 17, 2020, Paper Sent Back for Revision, October 17, Paper Acceptance Date, November 12, Paper Published Online, June 25, 2021.

Abstract

Films are a high-risk industry. Accurate prediction of movie box-office revenues can reduce this market risk and inform the investment decisions regarding promotion of the movie closer to a film’s release or right after release. Studies have shown that chatter on social media platforms like Twitter along with certain movie-related factors can be useful in predicting success of movies. Sentiment of tweets for any movie gives important information about the consumer’s reaction and the polarity of these sentiments has been shown to have an impact on prediction of box-office revenues. This paper presented a novel Bollywood domain specific sentiment lexicon that delivered state-of-the-art performance for polarity determination of reviews. SentiDraw lexicon was built on movie reviews scraped from IMDB and calculated the sentiment orientation of these words by calculating the probability distribution of words across reviews with different star ratings. The results showed that SentiDraw lexicon delivered a superior performance compared to any other lexicon-based method. This significantly contributed in enhancing the prediction accuracy of box office for movies using textual data from Twitter for analysis. In fact, this study demonstrated an extremely parsimonious regression model that used only budget, hype factor, tweet volume, and polarity of tweets for a robust prediction of box office revenues even before the release of a movie.

Downloads

Download data is not yet available.

Author Biographies

Shashank Shekhar Sharma, Research Scholar, Indian Institute of Foreign Trade, IIFT Bhawan, B-21, NRPC Colony, Block B, Qutab Institutional Area, New Delhi - 110 016

ORCID iD : 0000-0002-2931-2193

Gautam Dutta, Professor, Indian Institute of Foreign Trade, IIFT Bhawan, B-21, NRPC Colony, Block B, Qutab Institutional Area, New Delhi - 110 016

ORCID iD : 0000-0003-1500-7929

Downloads

Published

2021-07-31

How to Cite

Sharma, S. S., & Dutta, G. (2021). Prediction of Box Office for Bollywood Movies Using State-of-the-Art SentiDraw Lexicon for Twitter Analysis. Indian Journal of Marketing, 51(5-7), 9–31. https://doi.org/10.17010/ijom/2021/v51/i5-7/161644

References

Abbasi, A., France, S., Zhang, Z., & Chen, H. (2011). Selecting attributes for sentiment classification using feature relation networks. IEEE Transactions on Knowledge and Data Engineering, 23(3), 447–462. https://doi.org/10.1109/tkde.2010.110

Almatarneh, S., & Gamallo, P. (2018). Automatic construction of domain-specific sentiment lexicons for polarity classification. In, F. De la Prieta et al. (eds), Trends in cyber-physical multi-agent systems. The PAAMS Collection - 15th International Conference, PAAMS 2017. Advances in Intelligent Systems and Computing (Vol. 619). Springer, Cham. https://doi.org/10.1007/978-3-319-61578-3_17

Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0 : An enhanced lexical resource for sentiment analysis and opinion mining. In, Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10) (Vol. 10, No. 2010, pp. 2200–2204). https://doi.org/10.1109/mis.2010.94

Bai, X. (2011). Predicting consumer sentiments from online text. Decision Support Systems, 50(4), 732–742. https://doi.org/10.1016/j.dss.2010.08.024

BhÄle, S., & Tongare, K. (2018). A conceptual model of helpfulness of online reviews in a blink. Indian Journal of Marketing, 48(2), 7–22. https://doi.org/10.17010/ijom/2018/v48/i2/121331

Chintagunta, P. K., Gopinath, S., & Venkataraman, S. (2010). The effects of online user reviews on movie box office performance : Accounting for sequential rollout and aggregation across local markets. Marketing Science, 29(5), 944–957. https://doi.org/10.1287/mksc.1100.0572

Dastidar, S. G., & Elliott, C. (2019). The Indian film industry in a changing international market. Journal of Cultural Economics, 44(1), 97–116. https://doi.org/10.1007/s10824-019-09351-6

Dellarocas, C., Zhang, X. (Michael), & Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales : The case of motion pictures. Journal of Interactive Marketing, 21(4), 23–45. https://doi.org/10.1002/dir.20087

Dhir, R., & Raj, A. (2018). Movie success prediction using machine learning algorithms and their comparison. 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), 385–390. https://doi.org/10.1109/icsccc.2018.8703320

Du, Y., Zhao, X., He, M., & Guo, W. (2019). A novel capsule based hybrid neural network for sentiment classification. IEEE Access, 7, 39321–39328. https://doi.org/10.1109/access.2019.2906398

EY India. (2019, January 14). The Indian film tourism industry has potential to generate US$3b by 2022 [press release]. https://www.ey.com/en_in/news/2019/01/indian-film-tourism-industry-has-potential-to-generate-usd-3-billion-by-2022

Gatti, L., Guerini, M., & Turchi, M. (2016). SentiWords : Deriving a high precision and high coverage lexicon for sentiment analysis. IEEE Transactions on Affective Computing, 7(4), 409–421. https://doi.org/10.1109/taffc.2015.2476456

Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact of product reviews : Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering, 23(10), 1498–1512. https://doi.org/10.1109/tkde.2010.188

Iqbal, F., Hashmi, J. M., Fung, B. C., Batool, R., Khattak, A. M., Aleem, S., & Hung, P. C. (2019). A hybrid framework for sentiment analysis using genetic algorithm based feature reduction. IEEE Access, 7, 14637–14652. https://doi.org/10.1109/access.2019.2892852

Jaiswal, S. R., & Sharma, D. (2017). Predicting success of Bollywood movies using machine learning techniques. In, Proceedings of the 10th Annual ACM India Compute Conference (Compute’17). Association for Computing Machinery. https://doi.org/10.1145/3140107.3140126

Jiménez - Zafra, S. M., Martin, M., Molina - González, M. D., & Urena - Lopez, L. A. (2016). Domain adaptation of polarity lexicon combining term frequency and bootstrapping. In, Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 137–146). Association for Computational Linguistics. https://doi.org/10.18653/v1/w16-0422

Khan, F. H., Qamar, U., & Bashir, S. (2015). SentiMI : Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Applied Soft Computing, 39, 140–153. https://doi.org/10.1016/j.asoc.2015.11.016

Khan, F. H., Qamar, U., & Bashir, S. (2016). Senti - CS : Building a lexical resource for sentiment analysis using subjective feature selection and normalized chi - square based feature weight generation. Expert Systems, 33(5), 489–500. https://doi.org/10.1111/exsy.12161

Khoo, C. S., & Johnkhan, S. B. (2018). Lexicon-based sentiment analysis : Comparative evaluation of six sentiment lexicons. Journal of Information Science, 44(4), 491–511. https://doi.org/10.1177/0165551517703514

Labille, K., Gauch, S., & Alfarhood, S. (2017, August). Creating domain-specific sentiment lexicons via text mining. WISDOM' 17. http://www.csce.uark.edu/~sgauch/5543/F17/notes/wisdom17.pdf

Lee, H., Han, Y., & Kim, K. (2014). Sentiment analysis on online social network using probability Model. In, AFIN 2014 : Proceedings of the Sixth International Conference on Advances in Future Internet (pp.14–19). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.671.6392&rep=rep1&type=pdf

Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics : Human Language Technologies (Vol. 1, pp. 142–150). https://www.aclweb.org/anthology/P11-1015.pdf

Musto, C., Semeraro, G., & Polignano, M. (2014, December). A comparison of lexicon-based approaches for sentiment analysis of microblog posts. Information Filtering and Retrieval. In, DART@ AI* IA (pp. 59–68). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.664.7765&rep=rep1&type =pdf#page=66

Narayanaperumal, M. (2020). Deep neural networks for sentiment analysis in tweets with emoticons (Doctoral Dissertation). Nova Southeastern University. https://nsuworks.nova.edu/gscis_etd/1117

Niraj, R., & Singh, J. (2015). Impact of user-generated and professional critics reviews on Bollywood movie success. Australasian Marketing Journal, 23(3), 179–187. https://doi.org/10.1016/j.ausmj.2015.02.001

Pang, B., & Lee, L. (2004). A sentimental education : Sentiment analysis using subjectivity summarization based on minimum cuts. In, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL ’04). Association for Computational Linguistics, USA. https://doi.org/10.3115/1218955.1218990

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up ? Sentiment classification using machine learning techniques. In, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 (EMNLP ’02). Association for Computational Linguistics, USA. https://doi.org/10.3115/1118693.1118704

Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count : LIWC 2001. Mahway : Lawrence Erlbaum Associates.

Prabowo, R., & Thelwall, M. (2009). Sentiment analysis : A combined approach. Journal of Informetrics, 3(2), 143–157. https://doi.org/10.1016/j.joi.2009.01.003

Reddy, A. S., Kasat, P., & Jain, A. (2012). Box - office opening prediction of movies based on hype analysis through data mining. International Journal of Computer Applications, 56(1), 1–5. https://doi.org/10.5120/8852-2794

Saif, H., Fernandez, M., He, Y., & Alani, H. (2014). SentiCircles for contextual and conceptual semantic sentiment analysis of Twitter. In, V. Presutti, C. D’Amato, F. Gandon, M. D’Aquin, S. Staab, & A. Tordai (eds), The semantic web : Trends and challenges. ESWC 2014. Lecture Notes in Computer Science (Vol. 8465). Springer, Cham. https://doi.org/10.1007/978-3-319-07443-6_7

Sharma, S. S., & Dutta, G. (2018). Polarity determination of movie reviews : A systematic literature review. International Journal of Innovative Knowledge Concepts, 6(12), 43–55.

Shaukat, Z., Zulfiqar, A. A., Xiao, C., Azeem, M., & Mahmood, T. (2020). Sentiment analysis on IMDB using lexicon and neural networks. SN Applied Sciences, 2(2), 1–10. https://doi.org/10.1007/s42452-019-1926-x

Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307. https://doi.org/10.1162/coli_a_00049

Thelwall, M. (2017). The heart and soul of the web ? Sentiment strength detection in the social web with SentiStrength. In, J. Holyst (eds), Cyberemotions. Understanding complex systems. Springer, Cham. https://doi.org/10.1007/978-3-319-43639-5_7

Thomas, F. C., & Patel, N. K. (2020). Determining the effectiveness of promotion and reviews of Bollywood films from audiences : An empirical study. Indian Journal of Marketing, 50(4), 7–24. https://doi.org/10.17010/ijom/2020/v50/i4/151570

Utomo, T. S., Sarno, R., & Suhariyanto. (2018, September). Emotion label from ANEW dataset for searching best definition from WordNet. In, 2018 International Seminar on Application for Technology of Information and Communication (pp. 249–252). IEEE. https://doi.org/10.1109/isemantic.2018.8549769

Venkataraman, N., & Raman, S. (2016). Impact of user-generated content on purchase intention for fashion products : A study on women consumers in Bangalore. Indian Journal of Marketing, 46(7), 23–35. https://doi.org/10.17010/ijom/2016/v46/i7/97125