Sign in

Tracking the changing narratives of George W. Bush and Hillary Clinton?

Perception of public figures changes overtime and mistakes are forgotten with time. George W. Bush was an unpopular president by the time he left office in 2008, his approval ratings had fallen just below 20% at one point. Some 10 years later, he is seen through a less disapproving lens and many are wondering if it is time to rehabilitate his reputation.

Headline from Texas Monthly

Why George W. Bush and Hillary Clinton? Many public figures go through this process, so why focus on these two? The more technical answer is that so many articles have been written about the two of them that it provides a

lot of data to analyze. The non-technical answer is that the two of them have both divided public opinion at various points in their careers that it was almost guaranteed to see something interesting in the analysis. Their other similarities are they are both politicians and belong to political families. They are also obviously different gender-wise, political affiliation-wise, and presidency-wise. These similarities and differences between the two would allow me to contrast their narratives in addition to tracking the changes of their individual narratives.

Article Data: I scraped the NY Times API for article abstracts and other details. I gathered data from 2000 to March 2021. In total, there were about 57,000 article abstracts.

The bulk of articles written about George W. Bush were written during his presidency. I scrapped a total of 40,821 articles for Bush.
There are two peaks for Hillary Clinton: her first run during the 2008 Democratic primary and her 2016 Presidential Campaign. I scrapped a total fo 16,444 articles for Clinton.

Topic Modeling: In order to track the changing narratives, I first used topic modeling to identify the main topics that appear in each corpus. I tried both Latent Semantic Analysis and Non-Negative Matrix Factorization with both Count Vectorization and TF-IDF. I found that for both corpuses NMF with Count Vectorization led to the best topics.

Bush Topics:

NMF with Count Vectorization led to four topics listed below with the top 20 most frequently used word in the topic.

Iraq: iraq, war, government, security, military, troops, forces, hussein, saddam, world, weapons, baghdad, country, help, time, leaders, enemy, support, al, one.

Elections: republican, democrat, presidential, campaign, john, senate, democratic, gov, party, election, gore, bill, political, court, vote, vice, mccain, voters, florida, national.

Foreign Policy: administration, officials, israel, nuclear, north, military, palestinian, government, nations, security, korea, national, federal, washington, weapons, intelligence, attacks, policy, two, international.

Fiscal Policy: tax, billion, cuts, budget, spending, percent, congress, plan, federal, cut, bill, security, kerry, social, money, taxes, health, senate, million, next.

The average topic weighing for each year

In the chart above, we can see that the Iraq war was a frequent topic during 2003 to 2005. Headlines during this time often critiqued Bushes handling of the Iraq war.

Sample headline from 2003
Sample headline from 2004

Closer to 2020 to 2021, the topic of elections dominates the other topics, so fewer articles are discussing Bush’s mistakes in Iraq. In fact, the election topic dominates because Bush is now being referred to as a unifying force after Biden’s election.

Sample headline from 2021

We can see that once Bush left public office, discussion about the Iraq war were no longer at the forefront, and all other topics appeared more frequently than the Iraq war. In Bush’s case, time really did heal all “wounds.”

Clinton Topics:

NMF with Count Vectorization led to three topics listed below with the top 20 most frequently used word in the topic.

Elections:democrat, republican, candidate, campaign, party, race, senate, presidential, york, vote, political, nomination, giuliani, primary, money, lazio, state, mayor, committee, election.

Foreign Policy: iraq, bush, american, state, federal, officials, president, government, administration, security, war, former, nations, plan, york, political, senate, republican, public, israel

Work with Obama: senator, obama, president, campaign, former, presidential, washington, race, mccain, bill, primary, secretary, supporters, state, nomination, sanders, vote, made, night, pennsylvania.

For Clinton, the three main topics are elections, foreign policy, and work with Obama (including elections). In the chart above, we see in 2000 the topic of elections dominates other topics. This is due to Hillary Clinton’s run of Senate. Some headlines from this period highlight the personal questions she has to answer about her marriage and her struggle to appear “fun” and “spontaneous” on the campaign trail.

Sample headline from 2000
Sample headline from 2000

For the foreign policy topic, we can see an interesting spike in articles during 2004–2005. This at first may seem off because Clinton was not Secretary of State at the time, but this spike is due to her frequent criticism of Bush and the Iraq war.

Sample headline from 2006

Lastly, around 2007 the work with Obama topic begins to dominate. This is due to her help with Obama’s election after her own Democratic primary loss and her appointment as Secretary of State in the Obama Administration.

Sample headline from 2008

Finally, we see a spike in the election topic around her run for presidency in 2016.

Sample headline 2016

For Hillary Clinton, we see that her at times messy personal life is still referenced in her run for Senate in 2000, but as she accomplishes more, the headlines move away from her personal life.

Words also influence narrative: There is a limitation to just looking at topics to determine narrative, when word choice also has an impact. I trained a word2vec model on the combined corpus to see if I can pick up semantic similarity in the words used in these articles. For the terms “Hillary Clinton,” “George Bush,” “Hillary,” and “George,” I transformed the word embeddings into a 2-d vector, so I can plot out the top 25 most semantically similar words for each term.

In the word2vec model, “George Bush” is semantically similar to other presidents like “Jimmy Carter” and “Bill Clinton.” “Hillary Clinton” is similar to other politicians like “Mitt Romney” and “Bernie Sanders.” “George” is similar to other names like “Bob” and “Ron.” But “Hillary” is mostly similar to other words like “pretty”, “wild”, and “sister.” A gendered use of language is starting to appear. Since George is a very common name, it makes sense that the model did not identify any adjectives or words associated with George Bush to that name. I can use a non-common first name to see if I am correct about why “George” is semantically similar to names, but “Hillary” isn’t. Furthermore, if I pick a non-common male first name, I can really see if the language is gendered. The obvious pick for first name is “Barack.” I repeated the same process, but focused on just the terms “Barack” and “Hillary.”

We can see the same words appear as similar to “Hillary,” “sister” and “pretty.” For “Barack,” words associated with Barack Obama appear more than first names. “Barack” is similar to “maverick,” “cool,” “freshman,” “newest,” and “beating.” This is more evidence of the obvious and subtle gendered language that is being used in these articles. Barack Obama is the “cool” “newest” “freshmen” compared to the “presumably” and “forget” Hillary Clinton. (Lemmatization was used to clean the words, so different tenses were condensed into one form) .

Conclusion: Both narratives changed overtime. For Bush, the retrieval form public eye allowed the media time to forget about his presidential blunders. For Clinton, moving forward and generating new headlines about her accomplishments allowed the media to move away from her personal life. Considering their political differences, surprisingly only one major disagreement appeared in the topics: the Iraq war. Narrative differences also appeared in the words being used. Hillary Clinton was subject to more gendered language that seems to have persisted throughout the last two decades.