Bojack-Tidytext-Analysis

Character Interactions

The frequency count of references to characters provides some useful information regarding how much a character is a focus of conversation. However, what would be more informative would be to consider how frequently characters speak, and to whom, rather than how often their name is mentioned. This requires substantial transcription, and I have currently only completed a limited amount. However, we can test this as a proof of concept, that can be further developed moving forward.

relations <-  read_csv("BojackAnnotated.csv") %>%
              mutate(Speaker = as.factor(Speaker), Listeners = as.factor(Listeners)) %>%
              select(-Timecode.in, -Timecode.out, -Others, -`Conversation Number`, -`Cut-Away`)
              head(relations) %>% kable("html") %>%
              kable_styling(bootstrap_options = c("striped", "hover"))
Line Text season\_num episode\_num Speaker Listeners
1 Horsin’ Around is filmed before a live studio audience. 1 1 HA-H NA
2 Mondays. 1 1 HA-S HA-H
3 Well, good morning to you too. 1 1 HA-H HA-S
4 Oh, hey. 1 1 HA-S HA-H
5 Where? I’d love hay. 1 1 HA-H HA-S
6 In 1987, the situation comedy Horsin’ Around 1 1 CR BH

The next step is to create an adjacency list, counting the interactions between each speaker and each listener.

MajorListeners <- relations %>%
              mutate(`Bojack Horseman` = ifelse(Listeners %in% "BH",T,F),
                     `Diane Nguyen` = ifelse(Listeners %in% "DN",T,F),
                     `Mister Peanutbutter` = ifelse(Listeners %in% "MP",T,F),
                     `Todd Chavez` = ifelse(Listeners %in% "TC",T,F),
                     `Princess Carolyn` = ifelse(Listeners %in% "PC",T,F),
                     `Extra` = ifelse(!Listeners %in% c("BH","DN","MP","TC","PC"),T,F) 
                     ) %>%
              select(-episode_num, -season_num, -Line)

This gives us a matrix of speakers and Listeners, with the speaker kept as a factor, and each major character as a logical element. The main issue is that if multiple listeners are involved in a discussion, they are not recorded individually, but instead as an EXTRA. This requires further attention in order to correct it.

The next step is to sum the interactions between each character:

SummaryList <- MajorListeners %>% filter(Speaker %in% c("Bojack Horseman","Diane Nguyen", "Mister Peanutbutter", "Todd Chavez", "Princess Carolyn")) %>% group_by(Speaker) %>% summarise(`Bojack Horseman` = sum(`Bojack Horseman`), `Diane Nguyen` = sum(`Diane Nguyen`), `Mister Peanutbutter` = sum(`Mister Peanutbutter`), `Todd Chavez` = sum(`Todd Chavez`), `Princess Carolyn` = sum(`Princess Carolyn`)) %>% gather(Listener, Count, -Speaker)

SummaryList %>% kable("html") %>% kable_styling(bootstrap_options = c("striped", "hover"))
Speaker Listener Count
Bojack Horseman Bojack Horseman 0
Diane Nguyen Bojack Horseman 169
Mister Peanutbutter Bojack Horseman 67
Princess Carolyn Bojack Horseman 113
Todd Chavez Bojack Horseman 223
Bojack Horseman Diane Nguyen 168
Diane Nguyen Diane Nguyen 0
Mister Peanutbutter Diane Nguyen 7
Princess Carolyn Diane Nguyen 0
Todd Chavez Diane Nguyen 3
Bojack Horseman Mister Peanutbutter 36
Diane Nguyen Mister Peanutbutter 4
Mister Peanutbutter Mister Peanutbutter 0
Princess Carolyn Mister Peanutbutter 0
Todd Chavez Mister Peanutbutter 0
Bojack Horseman Todd Chavez 210
Diane Nguyen Todd Chavez 0
Mister Peanutbutter Todd Chavez 3
Princess Carolyn Todd Chavez 6
Todd Chavez Todd Chavez 0
Bojack Horseman Princess Carolyn 109
Diane Nguyen Princess Carolyn 0
Mister Peanutbutter Princess Carolyn 0
Princess Carolyn Princess Carolyn 0
Todd Chavez Princess Carolyn 2

Finally, we develop the Chord Diagram:

chordColour = c("#a8e6cf",
                "#dcedc1",
                "#ffd3b6",
                "#ff8b94",
                "#e6e6fa")

chordDiagram(SummaryList, grid.col = chordColour, directional = T)

circos.clear()
chordColour = c("#a8e6cf",
                "#dcedc1",
                "#ffd3b6",
                "#ff8b94",
                "#e6e6fa")

chordDiagram(SummaryList, grid.col = chordColour, directional = T)

circos.clear()

This plot shows the proportion of dialogue spoken to and by a character. The anti-clockwise side of the segment (in their colour) shows how many lines of dialogue the character speaks, and to whom they address it. The multicolored clockwise segment shows the proportion of dialogue addressed to them by others, showing how this is divided by speakers. Thus, we can see that in the first 4 episodes it is Bojack’s relationships that dominate the show. This is understandable as he is the titular character.

Bojack and Todd have the most interactions with each other in this time-period, and most of the relationships seem roughly equal between speaking and listening.

What would be interesting would be to compare the circular plots of early episodes with later seasons, seeing if the ensemble cast take on a greater proportion of the interactions.

I could select a random later episode to transcribe out of order in order to compare these dynamics, and will provide further results when this is complete.