Who among us hasn’t fallen victim to the addictive power of a binge-worthy Netflix show? For my final project at Metis, I chose to explore elements in popular shows that might lead you to start “binge watching” on Netflix.
The popularity of Netflix and other online streaming models has been rapidly increasing in recent years. But with competition from HBO, Amazon, Hulu, and more, quality content is becoming increasingly important for Netflix to retain subscribers.
One of the keys to Netflix’s success is the idea of “binge watching“, and this has become an important feature in Netflix analytics. The binge watching model pioneered by Netflix has introduced new strategies for programming and writing shows. The trajectory of the first season of a show is critical to attract viewers, but watching episodes back-to-back without having to wait 1 full week between broadcast air time means that the pilot episode has much less weight than in traditional TV viewing. However, learning how to engage viewers early in a series can help Netflix optimize content and increase subscriptions.
The inspiration for my project came from a recent report from Netflix that shared the episode of many popular shows that “hooked” viewers, meaning that 70% or more continued on to complete the full first season after this episode.
One interesting anecdote from this report is that the “hooked points” were not correlated to the type of show — comedy vs. drama — or the number of episodes. Netflix representatives confirmed that, based on their analysis, the “hooked point” was determined by the content itself.
This led me to the goal of my project, which was to use Natural Language Processing techniques and feature engineering methods to find elements in these TV episode scripts that “hook” viewers and contribute to the binge-worthiness of a series.
- Collect TV scripts using Beautiful Soup
- Train word2vec model to create word embeddings on the script text in each episode’s script
- Divide the script into equal-sized parts, find the section with the largest cosine distance from the rest (I chose this method since I suspected that a shift in theme or tone would point to the climax of the episode that hooked viewers)
- Iteratively narrow in on the “hook” section of the script by incorporating an 8-parameter sentiment analysis tool and feature engineering to find important sections of this subset, (e.g. length of dialogue, punctuation used, number of characters, scene cuts)
- Use tf-idf weighting and NMF topic modeling to compare themes between shows
My analysis led to 3 central themes that occurred in multiple “hook” episodes:
The Supporting Character Development theme can easily be seen in the “hook” episode of Mad Men (episode 6). Here, the character of Peggy, Don Draper’s secretary, starts to come to life as she gains respect from the men in her office after demonstrating marketing savvy during a product trial. The t-SNE plot shows a 2-D representation of the word clusters in this section of the text, and we can clearly see how the content and overall tone in this part diverges from the core of the episode shown in red.
The second theme, Joining Forces, occurred in many of these episodes when 2 characters in separate story lines come together to work toward a common goal. In the “hook” episode of Breaking Bad, for example, Jesse and Walt realize early on that they need to work together to get out of many of the messy situations associated with their new venture. Here, the t-SNE plot also shows how this theme varies from the word clusters in the remainder of the episode.
Finally, Demonstration of Leadership proved to be a common theme in Netflix “hook” episodes. In this scene from House of Cards, Frank delivers a powerful sermon at a church in his hometown and shows his ability to gather a community around him during times of need. This trait in a protagonist is a prime example of something that engages viewers early on in the pilot season of a show.
This project was a fun way to explore a new phenomenon in video streaming and leverage data science to figure out what appeals to viewers in popular TV shows. I really enjoyed using these quantitative methods do analyze the presumably qualitative process that goes into TV writing. Along the way, I got to make use of new NLP tools and apply word2vec as a neural network on the script text. Now, back to my research…more binge watching!