In 2017-2018, I had the great pleasure of supervising then final year undergraduate student Kelsea Stewart for her dissertation which investigated swearing and sound symbolism. That project is for another blog and another paper and another day, but as part of it we collected a bunch of data that I am failing miserably to write-up. Some of the reason for this is because life got in the way, I got a fantastic new job and my pedagogical research took-off and for the good of my career I focused on that.

The bigger reason though, is that I am not quite sure what to do with the data we have, what paper I want to write, or even if I should write a paper. The primary purpose of the data we collected was for stimuli generation - we asked people to write down as many swear words as they could in their native language. But, given that swearing studies tend to prove quite popular, we added in some demographic questions as well as a measure of self-reported swearing frequency and the result is that we have a lot of interesting data but I am still struggling to figure out the story I want to tell. I know that the story is exploratory, descriptive, and that inferential tests shouldn’t be the main feature. Which of course means it’s very different to the theory-driven, hypothesis testing approach I would normally take and is probably part of the reason I am struggling to form a narrative. Perhaps there does not need to be a narrative. Maybe there is no spoon. Or maybe there’s just not enough here for a paper at all.

So here we are on a Sunday morning in November 2019 where my semester of 9am lectures means my body-clock refuses to allow me a lie-in and I will attempt to ignore the need for a narrative and present A Bunch Of Interesting Graphs and hope that any resulting discussion on Twitter will help guide my thoughts. A lot of the coding for the below was done by Judith de Mevergnies, who is working on the project as a student intern and is testement to the amazing data skills of our @UofGPsychology students.

Participants

In total, there were 409 participants who provided swear words in 36 different languages. For the purpose of this analysis, we’re only going to focus on the English swear words from the relevant 262 participants (mean age = 25.92, SD = 8.45, min = 15, max = 64. The sample consisted of 148 men, 104 women, and 8 non-binary people. Because of the small number of non-binary people, they aren’t included in all the analyses in this document.

Swear words

It should go without saying that from this point onwards you should expect a lot of very strong language.

The instructions we kept purposefully simple: participants were asked to provide as many swear words as they could and that they could be as creative as they liked. In addition to some basic wrangling (e.g., converting to lower case, removing hyphens etc.), Judith also coded each swear word into categories based on Jay (2009):

Sexual (e.g., fuck, dick, cunt)
Scatological (e.g., shit, piss)
Animal names (e.g., ass, cow)
Ancestral (e.g., bastard)
Blasphemous (e.g., damn, hell)
Slur (e.g., faggot, dyke)
Slang (e.g., bitch-ass, fuck-nugget)
Deviation from social norms (e.g., skank, fatty)

Furthermore, slurs were then categorised into racial and homophobic/transphobic slurs.

Top swear words

Previous research on swearing has found that we are generally quite unoriginal with our swearing and that a relatively small number of words account for the majority of our swearing. Our results reflect this, with fuck, shit, and cunt accounting for ~45% of all words provided.

And because no-one does research on swearing without having a slightly infantile sense of humour, here’s the data in a word cloud because I find it amusing.

The type of swear words we use is also fairly constrained with a whopping 44% of all swear words provided having a sexual connotation.

Slurs made up 6.78% of all words provided, with 3.78% homophobic & transphobic slurs and 3% racial slurs. When broken down by binary gender, 9.57% of all words men provided were slurs compared to 4.41% of the words women provided. Additionally, 3.78% of all words were homophobic & transphobic slurs and 3.78% were racial slurs, however, whilst there was little difference in the proportion of homophobic and transphobic slurs men provided, women were less likely to provide racial slurs, χ2(1, N = 315) = 5.74, p = 0.017

The morphemes we combined with swear words are also relatively constrained with the top 5 being functional affixes but the bottom 5 are a pleasing list of amusing roots.

Swearing fluency

I’m going to refer to the number of different swear words participants provided as swearing fluency, although given that there was no time limit and it was done online with the consequental lack of experimental control, I am fully aware that it’s a very, very rough proxy.

Overall, participants provided an average of 18.68 (SD = 11.02) swear words and there was no gender difference in the number of words provided (this was the case whether or not non-binary people were included), F(2, 257) = 1.25, p = 0.288, ηp2 = .0.01.

There was, however, a significant difference in the number of words provided by education level, F(3, 257) = 4.07, p = 0.008, ηp2 = .0.05, which is supportive of the general theory of Jay and Jay (2015) that swearing fluency is strongly related to general word fluency and it also suggests that as a measure of swearing fluency, our data isn’t all that bad. Post-hoc tests revealed that it was the difference between those in high school and postgrad that was significant. One possible confound with this result is that looking at the plots, I believe the wording of the question has led some people to interpret college in the American sense of the word (i.e., undergraduate), rather than the British intepretation (highers/A-levels).

There was a weak positive correlation between age and fluency (r (260) = 0.19, p = 0.002) which is concordant with a fairly robust finding that word fluency increases with age (one might expect swearing usage to decline with age, but usage is not the same as fluency).

And to see whether the stereotype of my adopted country being more sweary than my mother country is true, I compared thenumber of swear words provided by Scottish and English participants and I am sad to say that there was no difference (t(49.45) = -0.01, p < 0.992, 95% CI of the difference = [-6.35, 6.28]).

Although I’ve now lived in Glasgow for a year and fluency is definitely not the same as usage, ya wee cunt.

Self-reported swearing frequency by context

In addition to asking participants to write down as many swears as they could, we also asked them to provide self-reported swearing frequency in a range of contexts - alone, with friends, with family, and at work and it looks much like you would expect, with swearing with friends and alone higher than with family and at work.

Swearing frequency in all contexts showed medium to strong positive correlations with each other. Swearing fluency (the number of words provided) was also positively correlated (albeit small correlations) with all measures of verbal swearing frequency although not with written swearing frequency. Finally, age was positively correlated with swearing with family and at work which is likely to reflect the role that social status and hierarchies plays in swearing behaviour (you’re more likely to swear at work if you’re higher up and you’re more likely to be higher up when you’re older).

	Age	Words	Alone	Family	Friends	Work
age
number_words	0.19**
alone	0.06	0.22***
family	0.25***	0.22***	0.38***
friends	0.02	0.14*	0.41***	0.39***
work	0.36***	0.20**	0.30***	0.33***	0.51***
writing	-0.10	0.01	0.32***	0.24***	0.57***	0.35***

Ordinal regression

Ordinal regression found no main effect of gender nor an interaction between gender and context on ratings of swearing frequency, only a main effect of context.

term	df	statistic	p.value
gender_coded	1	2.08	0.15
context	4	207.87	0.00
gender_coded:context	4	5.62	0.23

And that’s it! Writing this up has helped - there’s a huge number of ways the data could be chopped up and whilst I’m happy to clearly report these analyses as exploratory to the max I’m loathe to chop it up every single way I could. I feel like the above are probably the core set. But I’m also just not sure that this data is anything more than an interesting blog post or perhaps at best, an open data resource that other people could use. Please feel free to send thoughts via twitter or e-mail (emily.nordmann@glasgow.ac.uk)!

Sweary Sunday