텍스트마이닝 기법을 이용한 2024년 공화당 대선후보 수락 트럼프 연설내용 시각화하기

On July 19, 2024 (local date), Donald Trump gave his acceptance speech as a presidential candidate. Based on this speech, I tried to visualize it using text mining techniques in R. I will write the source code for this task and explain it step by step.

You can use the text mining techniques in this post to analyze word frequency, generate word clouds, analyze sentiment, analyze bigrams, and more on any textual information written in English. You don't have to go looking for a text mining site.

Data visualization is an important technique for visually representing complex data to aid understanding. Especially with textual data, visualization can uncover hidden patterns and insights. By analyzing Trump's speeches and visualizing word frequencies and more, you can identify the main themes and mood of his speeches.

R Source Code for Text Mining Techniques

Install and load the required packages

First, install and load the required R packages. This code will automatically install and load the required packages for you.

# Install and load the required packages
required_packages <- c("tidytext", "dplyr", "ggplot2", "stringr", "igraph", "ggraph", "readr", "textdata", "wordcloud2", "htmlwidgets", "tidyr", "showtext")

# Checking for and installing uninstalled packages
new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

Load the # package
lapply(required_packages, require, character.only = TRUE)
showtext_auto()

This code installs and loads all the necessary packages. showtext_auto() function automatically loads fonts to avoid breaking text in ggplot2 graphs.

Read text files

Reads a text file and tokenizes it into words. The text file is stored in a Webpage published by the New York Timesto create it.

Read the # text file (please adapt the file_path to your own folder).
file_path <- "C:\\R\\Trump_2024_RNC_Acceptance_Speech.txt"
text <- read_file(file_path)

# Tokenize text to words
words %
    unnest_tokens(word, text)

# Remove unspecified terms
data(stop_words)
words %
    anti_join(stop_words)

This code reads in a text file, then separates the text into words and removes stopwords. Stop words are words that don't mean anything, and excluding them can improve the accuracy of your analysis.

1. word frequency analysis

Word frequency analysis to see the most frequently used words in your speech.

# Word Frequency Analysis
word_freq %
    count(word, sort = TRUE)

ggplot(word_freq[1:20,], aes(x = reorder(word, n), y = n, label = n)) +
    geom_col() +
    geom_text(hjust = -0.1, color = "black") +
    coord_flip() +
    labs(title = "Top 20 Words in Trump's 2024 RNC Acceptance Speech", x = "Word", y = "Frequency")

This code visualizes the top 20 words and their frequency in a graph. It plots the frequency of each word in a bar graph and adds the frequency count as a label.

This bar graph shows the top 20 most frequent words in Donald Trump's 2024 RNC acceptance speech. The most prominent words are "people," "country," and "we're," indicating a focus on collective identity and nation. The high occurrence of words like "America," "world," and "administration" suggests a theme of government and global issues.

2. Create a word cloud

Use word clouds to visually represent the frequency of words.

Create a # word cloud
set.seed(1234) Set seed for # reproducibility
wordcloud_data %
    filter(n > 1) %>% # Only include words that appear at least twice
    head(100) # Use only the top 100 words

Create a # word cloud
wc <- wordcloud2(data = wordcloud_data, 
                 size = 1,
                 color = "random-dark",
                 backgroundColor = "white",
                 rotateRatio = 0.3,
                 shape = "circle")

1Save as a TP5T HTML file
saveWidget(wc, "wordcloud_trump_2024.html", selfcontained = TRUE)

This code creates a word cloud with the top 100 words. wordcloud2 You can use packages to set different options.

This word cloud is a visualization of the most used words in the speech, with the size of the words indicating their frequency. Key words like "country," "people," and "we're" stand out. A wide variety of words are included, such as "love," "beautiful," and "America," showing that patriotism, praise, and positive themes were emphasized in the speech.

3. sentiment analysis

Analyze the sentiment of a speech to determine the frequency of positive or negative words.

# Sentiment Analysis
sentiments <- get_sentiments("bing")
sentiment_analysis %
    inner_join(sentiments) %>%
    count(sentiment) %>%
    pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
    mutate(sentiment = positive - negative)

print(sentiment_analysis)

This code counts the frequency of positive and negative words in a speech, and uses the difference to assess the overall sentiment.

This table summarizes the sentiment analysis of the speech. The speech contains 290 negative words and 329 positive words, which gives it a net positive sentiment score of 39. This means that it has a slightly more positive tone overall.

4. analyze the bigram

Analyze word pairs (bigrams) to see which word pairs are frequently used together.

# Bigram Analysis
bigrams %
    unnest_tokens(bigram, text, token = "ngrams", n = 2)

bigram_counts %
    separate(bigram, c("word1", "word2"), sep = " ") %>%
    filter(!word1 %in% stop_words$word) %>%
    filter(!word2 %in% stop_words$word) %>%
    count(word1, word2, sort = TRUE)

# Top 15 Bigrams Visualization
bigram_graph %
    graph_from_data_frame()

ggraph(bigram_graph, layout = "fr") +
    geom_edge_link(aes(edge_alpha = n), show.legend = FALSE) +
    geom_node_point(color = "lightblue", size = 5) +
    geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
    theme_void() +
    labs(title = "Top 15 Bigrams in Trump's 2024 RNC Acceptance Speech")

This code visualizes the top 15 word pairs to show you which word pairs are frequently used together in speeches.

This graph shows the top 15 most used bigrams (two-word phrases) in speeches. Phrases like "they're coming," "border patrol," and "illegal aliens" indicate concern about immigration issues. Other important bigrams like "incredible people" and "worst administration" reveal themes of praise and criticism.

5. Analyze TF-IDF

Use TF-IDF analysis to identify important words in a sentence.

Copy the R code# Break a sentence into units
sentences %
    mutate(sentence_id = row_number())

tfidf %
    unnest_tokens(word, sentence) %>%
    count(sentence_id, word, sort = TRUE) %>%
    bind_tf_idf(word, sentence_id, n)

# Top TF-IDF Words Visualization
tfidf %>%
    arrange(desc(tf_idf)) %>%
    mutate(word = factor(word, levels = rev(unique(word)))) %>%
    group_by(sentence_id) %>%
    top_n(5) %>%
    ungroup() %>%
    ggplot(aes(word, tf_idf, fill = sentence_id)) +
    geom_col(show.legend = FALSE) +
    labs(x = NULL, y = "tf-idf", title = "Top TF-IDF Words by Sentence in Trump's Speech") +
    facet_wrap(~sentence_id, ncol = 2, scales = "free") +
    coord_flip()

This code visualizes the words with high TF-IDF values in each sentence to identify important words (this part could use a little more polish).

This plot shows the top TF-IDF (term frequency-inverse document frequency) words by sentence, highlighting unique and important terms used in different parts of the speech. The importance of each word is represented by its TF-IDF score, with words like "economy," "jobs," and "taxes" reflecting important topics in a particular sentence.

6. Analyze word frequency by topic

Analyze word frequency by topic to see the main topics covered in the speech.

# Thematic word frequency analysis
topics <- c("economy", "immigration", "foreign_policy", "social_issues")
topic_words <- list(
    economy = c("economy", "jobs", "taxes", "inflation", "business"),
    immigration = c("border", "immigration", "illegal", "wall"),
    foreign_policy = c("war", "peace", "military", "allies", "enemies"),
    social_issues = c("education", "healthcare", "crime", "family", "values")
)

topic_frequency %
    mutate(topic = case_when(
        word %in% topic_words$economy ~ "Economy",
        word %in% topic_words$immigration ~ "Immigration",
        word %in% topic_words$foreign_policy ~ "Foreign Policy",
        word %in% topic_words$social_issues ~ "Social Issues",
        TRUE to "Other"
    )) %>%
    count(topic) %>%
    filter(topic != "Other")

ggplot(topic_frequency, aes(x = reorder(topic, n), y = n, fill = topic)) +
    geom_col() +
    coord_flip() +
    labs(title = "Frequency of Topics in Trump's 2024 RNC Acceptance Speech",
         x = "Topic", y = "Frequency") +
    theme_minimal()

This code visualizes the word frequency for each of the main topics covered in the speech, giving you an idea of the importance of each topic.

This bar graph shows the frequency of topics in Trump's speeches, categorized by economy, immigration, foreign policy, and social issues. The economy is the most discussed topic, followed by immigration, foreign policy, and social issues. This categorization reveals the main focus areas of the speeches, showing an emphasis on economic issues and immigration.

Common mistakes and how to fix them

The importance of data preprocessing

The most important aspect of text data analysis is data preprocessing. Removing dead words, normalizing text, and correcting spelling greatly improves the accuracy of your analysis. Neglecting these tasks can skew the results of your analysis.

Choose the right visualization tool

It's important to choose the right visualization tool for the purpose of your analysis. For example, a bar graph works well for analyzing word frequency, but a pie chart or bar graph might be more appropriate for sentiment analysis. We recommend using a variety of visualization tools to analyze your data from multiple angles.

Full source code and results

Putting all of the code covered above together, we'll summarize it once again.

# Install and load the required packages
required_packages <- c("tidytext", "dplyr", "ggplot2", "stringr", "igraph", "ggraph", "readr", "textdata", "wordcloud2", "htmlwidgets", "tidyr", "showtext")
new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
lapply(required_packages, require, character.only = TRUE)
showtext_auto()

Read a # text file
file_path <- "C:\\R\\Project\\secondlife-r\\Trump_2024_RNC_Acceptance_Speech.txt"
text <- read_file(file_path)

# Tokenizing text to words and removing stopwords
words %
    unnest_tokens(word, text) %>%
    anti_join(stop_words)

# Word Frequency Analysis
word_freq %
    count(word, sort = TRUE)
ggplot(word_freq[1:20,], aes(x = reorder(word, n), y = n, label = n)) +
    geom_col() +
    geom_text(hjust = -0.1, color = "black") +
    coord_flip() +
    labs(title = "Top 20 Words in Trump's 2024 RNC Acceptance Speech", x = "Word", y = "Frequency")

Create a # word cloud
set.seed(1234)
wordcloud_data %
    filter(n > 1) %>%
    head(100)
wc <- wordcloud2(data = wordcloud_data, size = 1, color = "random-dark", backgroundColor = "white", rotateRatio = 0.3, shape = "circle")
saveWidget(wc, "wordcloud_trump_2024.html", selfcontained = TRUE)

# Sentiment Analysis
sentiments <- get_sentiments("bing")
sentiment_analysis %
    inner_join(sentiments) %>%
    count(sentiment) %>%
    pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
    mutate(sentiment = positive - negative)
print(sentiment_analysis)

# Bigram Analysis
bigrams %
    unnest_tokens(bigram, text, token = "ngrams", n = 2)
bigram_counts %
    separate(bigram, c("word1", "word2"), sep = " ") %>%
    filter(!word1 %in% stop_words$word) %>%
    filter(!word2 %in% stop_words$word) %>%
    count(word1, word2, sort = TRUE)
bigram_graph %
    graph_from_data_frame()
ggraph(bigram_graph, layout = "fr") +
    geom_edge_link(aes(edge_alpha = n), show.legend = FALSE) +
    geom_node_point(color = "lightblue", size = 5) +
    geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
    theme_void() +
    labs(title = "Top 15 Bigrams in Trump's 2024 RNC Acceptance Speech")

# TF-IDF Analysis
sentences %
    mutate(sentence_id = row_number())
tfidf %
    unnest_tokens(word, sentence) %>%
    count(sentence_id, word, sort = TRUE) %>%
    bind_tf_idf(word, sentence_id, n)
tfidf %>%
    arrange(desc(tf_idf)) %>%
    mutate(word = factor(word, levels = rev(unique(word)))) %>%
    group_by(sentence_id) %>%
    top_n(5) %>%
    ungroup() %>%
    ggplot(aes(word, tf_idf, fill = sentence_id)) +
    geom_col(show.legend = FALSE) +
    labs(x = NULL, y = "tf-idf", title = "Top TF-IDF Words by Sentence in Trump's Speech") +
    facet_wrap(~sentence_id, ncol = 2, scales = "free") +
    coord_flip()

# Thematic word frequency analysis
topics <- c("economy", "immigration", "foreign_policy", "social_issues")
topic_words <- list(
    economy = c("economy", "jobs", "taxes", "inflation", "business"),
    immigration = c("border", "immigration", "illegal", "wall"),
    foreign_policy = c("war", "peace", "military", "allies", "enemies"),
    social_issues = c("education", "healthcare", "crime", "family", "values")
)
topic_frequency %
    mutate(topic = case_when(
        word %in% topic_words$economy ~ "Economy",
        word %in% topic_words$immigration ~ "Immigration",
        word %in% topic_words$foreign_policy ~ "Foreign Policy",
        word %in% topic_words$social_issues ~ "Social Issues",
        TRUE to "Other"
    )) %>%
    count(topic) %>%
    filter(topic != "Other")
ggplot(topic_frequency, aes(x = reorder(topic, n), y = n, fill = topic)) +
    geom_col() +
    coord_flip() +
    labs(title = "Frequency of Topics in Trump's 2024 RNC Acceptance Speech", x = "Topic", y = "Frequency") +
    theme_minimal()

When you run the code above, you'll get a breakdown of Trump's speech in various aspects, each of which helps you understand the overall content of the speech by visualizing key words, sentiment, bigrams, TF-IDF, thematic frequency, and more.

Finalize

Overall, Donald Trump's 2024 RNC acceptance speech has a strong emphasis on themes of national identity and patriotism, evidenced by the high frequency of words like "people," "country," and "America." The speech maintains a net positive sentiment while addressing issues such as immigration and criticizing the administration. The focus on certain topics, particularly the economy and immigration, is in line with Trump's main political priorities. The detailed word usage and key phrases identified in the TF-IDF analysis and the bigram visualization further highlight these themes and provide a comprehensive understanding of the content and focus of the speech.

In this post, we covered how to use R to analyze and visualize text data. Using Trump's speeches as an example, we explored various text mining techniques, which we hope will help you gain meaningful insights from your text data.

**RNC: Short for "Republican National Committee," meaning the national committee of the Republican Party of the United States. The RNC is the official organization of the Republican Party that plans the party's strategy, supports election campaigns, and organizes the party's major events. This includes the Republican National Convention, which nominates presidential candidates.

Visualizing Trump's acceptance speech for the 2024 Republican presidential nomination using text mining techniques

R Source Code for Text Mining Techniques

Install and load the required packages

Read text files

1. word frequency analysis

2. Create a word cloud

3. sentiment analysis

4. analyze the bigram

5. Analyze TF-IDF

6. Analyze word frequency by topic

Common mistakes and how to fix them

The importance of data preprocessing

Choose the right visualization tool

Full source code and results

Finalize

Bitcoin dominance charts reveal the hidden story of the cryptocurrency market (feat. Python)

Baseball Rules, Including Out Counts: Easily Understand with Visualizations in R

Analyzing the impeachment of South Korea's president: R data visualization and historical lessons

소문에 사고 뉴스에 팔아라? 뷰직스 뉴스 이벤트 표시 차트

Exponential vs. polynomial growth rates: Key to understanding algorithmic efficiency

Creating interactive graphs with ggplot2 - Using plotly

R Source Code for Text Mining Techniques

Install and load the required packages

Read text files

1. word frequency analysis

2. Create a word cloud

3. sentiment analysis

4. analyze the bigram

5. Analyze TF-IDF

6. Analyze word frequency by topic

Common mistakes and how to fix them

The importance of data preprocessing

Choose the right visualization tool

Full source code and results

Finalize

Similar Posts