Everytime I watch a Kurzgesagt – In a Nutshell video, it makes me think whether I'm feeling optimistic or pessimistic about it. Perhaps it's because of all those stereotypical jokes and memes suggesting that Kurzgesagt videos induce existential crises.
This question prompted me to seek an answer, albeit with a naive approach. Therefore, I downloaded the transcript of the last 20 Kurzgesagt videos and fed it into the Vader Lexicon. The Vader Lexicon is a sentiment analysis tool that utilizes NLTK (Natural Language Toolkit) and operates based on a lexicon and rule-based approach.
TLDR
Here are the results:
- Out of the 20 videos reviewed, 12 of them conveyed a positive feeling to the reader, while 8 of them had a more negative vibe than positive.
- The average sentiment scores of the 20 reviewed videos were:
- Neutral: 76.55%
- Positive: 12.85%
- Negative: 10.6%
Implementation
It may not be technically accurate to claim that this naive approach measures the "optimistic" or "pessimistic" aspect of the videos. To determine whether something is pessimistic or optimistic, it is necessary to first establish a personal definition of what these terms mean. With that being said, feeding the entire transcript into the Vader using the approach mentioned still gives interesting results.
I have installed the vader_lexicon
first:
import nltk
import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
And then the transcript was fed into the polarity_scores
function:
sia = SentimentIntensityAnalyzer()
video_data = []
for transcript_file in transcript_files:
file_path = os.path.join("transcripts", transcript_file)
with open(file_path, 'r', encoding="utf8") as json_file:
data = json.load(json_file)
transcript = " ".join(data)
sentiment_scores = sia.polarity_scores(transcript)
video_data.append([
transcript_file,
float_to_percentage(sentiment_scores["neu"]),
float_to_percentage(sentiment_scores["pos"]),
float_to_percentage(sentiment_scores["neg"]),
"🔴 Negative" if sentiment_scores["compound"] < 0 else "🟢 Positive"
])
print(f"Sentiment scores of '{transcript_file}': {sentiment_scores}")
pd.DataFrame(video_data, columns=["Transcript", "⚪ Neutral", "🟢 Positive", "🔴 Negative", "Compound"])
And here are the results:
Transcript | ⚪ Neutral | 🟢 Positive | 🔴 Negative | Compound |
---|---|---|---|---|
black-hole-star-the-star-that-shouldn't-exist.json | 84% | 10% | 6% | 🟢 Positive |
change-your-life-one-tiny-step-at-a-time.json | 81% | 12% | 7% | 🟢 Positive |
how-to-terraform-mars-with-lasers.json | 84% | 11% | 5% | 🟢 Positive |
how-we-make-money-on-youtube-with-20m-subs.json | 78% | 18% | 4% | 🟢 Positive |
is-civilization-on-the-brink-of-collapse.json | 74% | 10% | 16% | 🔴 Negative |
lets-travel-to-the-most-extreme-place-in-the-universe.json | 84% | 10% | 6% | 🟢 Positive |
the-black-hole-that-kills-galaxies-quasars.json | 83% | 12% | 5% | 🟢 Positive |
the-deadliest-virus-on-earth.json | 74% | 13% | 13% | 🔴 Negative |
the-horror-of-the-slaver-ant.json | 72% | 12% | 16% | 🔴 Negative |
the-last-human-a-glimpse-into-the-far-future.json | 81% | 12% | 7% | 🟢 Positive |
the-most-complex-language-in-the-world.json | 83% | 10% | 7% | 🟢 Positive |
the-most-dangerous-weapon-in-not-nuclear.json | 72% | 15% | 13% | 🟢 Positive |
the-most-extreme-explosion-in-the-universe.json | 80% | 10% | 10% | 🔴 Negative |
the-reason-why-cancer-is-so-hard-to-beat.json | 61% | 12% | 27% | 🔴 Negative |
what-actually-happens-when-you-are-sick.json | 69% | 16% | 15% | 🔴 Negative |
what-happens-if-a-supervolcano-blows-up.json | 79% | 14% | 7% | 🟢 Positive |
why-aliens-might-already-be-on-their-way-to-us.json | 84% | 12% | 5% | 🟢 Positive |
why-don't-we-shoot-nuclear-waste-into-space.json | 74% | 11% | 14% | 🔴 Negative |
why-you-are-lonely-and-how-to-make-friends.json | 67% | 26% | 7% | 🟢 Positive |
your-body-killed-cancer-5-minutes-ago.json | 67% | 11% | 22% | 🔴 Negative |
Source Code
Source code is available at github.com/avestura/kurzgesagt-sentiment-analysis