Programme and speakers
Johannes Persson, Dean of the Joint Faculties of Humanities and Theology 09.20–10.00
Nele Põldvere, Victoria Johansson and Carita Paradis, Lund University
The new London–Lund Corpus (LLC–2): design, compilation, access 10.00–10.30
Bas Aarts, University College London
Research in spoken English: the London–Lund experience 10.30–11.00 Fika11.00–11.30
Gunnel Tottie, University of Zurich
Corpus linguistics now and then: the case of negation of indefinites 11.30–12.00
Karin Aijmer, University of Gothenburg
'They're like proper crazy like' – New uses of intensifiers in spoken English 12.00–13.30 Lunch break13.30–14.00
Robbie Love, University of Leeds
Building and analysing a national corpus of informal spoken English: the Spoken BNC2014 14.00–14.30
Susan Reichelt, University of Greifswald
Combining apparent and real time approaches to language change: Recent developments of kind of and sort of in spoken British English using BNClab14.30–15.00
Jonathan Culpeper, Lancaster University
On 'spokenness': From Early Modern to Present-day English 15.00–15.30 Fika
15.30–16.30
Herbert Clark, Stanford University (Keynote speaker)
On the use and misuse of language corpora
Professor Emeritus Jan Svartvik
Nele Põldvere, Lund University

The new London–Lund Corpus (LLC–2): design, compilation, access
This talk reports on the compilation of the new London–Lund Corpus (LLC–2) – a corpus of contemporary spoken British English, collected 2014–2019. The size and design of LLC–2 are the same as that of the world's first corpus of spoken language, namely the London–Lund Corpus (LLC–1), with spoken data mainly from the 1960s. In addition to the fact that we have a corpus of contemporary speech, the existence of LLC–2 also gives researchers the opportunity to make principles diachronic comparisons of speech over the past 50 years and detect change in communicative behaviour among speakers. The compilation of LLC–2 has included a number of different stages such as data collection, transcription of the recordings, markup and annotation, and finally making the corpus accessible to the research community. The talk describes and critically examines the methodological decisions made in each stage. For example, it was important to strike a balance between LLC–2 as a representative collection of data of contemporary spoken English and its comparability to LLC–1. Therefore, both corpora contain the same speech situations (dialogue, mainly everyday face-to-face conversation, as well as monologue), but the specific recordings added to LLC–2 also reflect the technological advances of the last few decades, particularly with respect to speech situations such as telephone calls (e.g., Skype) and broadcast discussions and interviews (e.g., podcasts). Moreover, the transcriptions in LLC–2 are orthographic and time-aligned with the corresponding sound files, which is a feature of the corpus that is novel and makes it possible to, among other things, investigate prosody and dialogue management among speakers with great precision. The corpus, as well as metadata about the transcriptions and the speakers, will be released to the public in late 2019 from the Lund University Humanities Lab's corpus server. The release will fill an unfortunate gap in the availability of spoken corpora for linguistic analysis. The benefits of spoken corpora in general and of LLC–2 in particular will be demonstrated in the talk through examples of case studies based on the corpus (e.g., Põldvere & Paradis, 2019a, 2019b). The case studies illustrate how LLC–2 can contribute to our understanding of meaning-making and discursive practices in real communication and provide a window into the cognitive and social processes of dialogic interaction, both from a contemporary and a back-in-time perspective. References Põldvere, N., & Paradis, C. (2019a). 'What and then a little robot brings it to you?' The reactive what-x construction in spoken dialogue. English Language and Linguistics. Advance online publication. doi:10.1017/S1360674319000091Põldvere, N., & Paradis, C. (2019b). Motivations and mechanisms for the development of the reactive what-x construction in spoken dialogue. Journal of Pragmatics, 143, 65–84.
Victoria Johansson, Lund University

The new London–Lund Corpus (LLC–2): design, compilation, access
This talk reports on the compilation of the new London–Lund Corpus (LLC–2) – a corpus of contemporary spoken British English, collected 2014–2019. The size and design of LLC–2 are the same as that of the world's first corpus of spoken language, namely the London–Lund Corpus (LLC–1), with spoken data mainly from the 1960s. In addition to the fact that we have a corpus of contemporary speech, the existence of LLC–2 also gives researchers the opportunity to make principles diachronic comparisons of speech over the past 50 years and detect change in communicative behaviour among speakers. The compilation of LLC–2 has included a number of different stages such as data collection, transcription of the recordings, markup and annotation, and finally making the corpus accessible to the research community. The talk describes and critically examines the methodological decisions made in each stage. For example, it was important to strike a balance between LLC–2 as a representative collection of data of contemporary spoken English and its comparability to LLC–1. Therefore, both corpora contain the same speech situations (dialogue, mainly everyday face-to-face conversation, as well as monologue), but the specific recordings added to LLC–2 also reflect the technological advances of the last few decades, particularly with respect to speech situations such as telephone calls (e.g., Skype) and broadcast discussions and interviews (e.g., podcasts). Moreover, the transcriptions in LLC–2 are orthographic and time-aligned with the corresponding sound files, which is a feature of the corpus that is novel and makes it possible to, among other things, investigate prosody and dialogue management among speakers with great precision. The corpus, as well as metadata about the transcriptions and the speakers, will be released to the public in late 2019 from the Lund University Humanities Lab's corpus server. The release will fill an unfortunate gap in the availability of spoken corpora for linguistic analysis. The benefits of spoken corpora in general and of LLC–2 in particular will be demonstrated in the talk through examples of case studies based on the corpus (e.g., Põldvere & Paradis, 2019a, 2019b). The case studies illustrate how LLC–2 can contribute to our understanding of meaning-making and discursive practices in real communication and provide a window into the cognitive and social processes of dialogic interaction, both from a contemporary and a back-in-time perspective. References Põldvere, N., & Paradis, C. (2019a). 'What and then a little robot brings it to you?' The reactive what-x construction in spoken dialogue. English Language and Linguistics. Advance online publication. doi:10.1017/S1360674319000091Põldvere, N., & Paradis, C. (2019b). Motivations and mechanisms for the development of the reactive what-x construction in spoken dialogue. Journal of Pragmatics, 143, 65–84.
Carita Paradis, Lund University

The new London–Lund Corpus (LLC–2): design, compilation, access
This talk reports on the compilation of the new London–Lund Corpus (LLC–2) – a corpus of contemporary spoken British English, collected 2014–2019. The size and design of LLC–2 are the same as that of the world's first corpus of spoken language, namely the London–Lund Corpus (LLC–1), with spoken data mainly from the 1960s. In addition to the fact that we have a corpus of contemporary speech, the existence of LLC–2 also gives researchers the opportunity to make principles diachronic comparisons of speech over the past 50 years and detect change in communicative behaviour among speakers. The compilation of LLC–2 has included a number of different stages such as data collection, transcription of the recordings, markup and annotation, and finally making the corpus accessible to the research community. The talk describes and critically examines the methodological decisions made in each stage. For example, it was important to strike a balance between LLC–2 as a representative collection of data of contemporary spoken English and its comparability to LLC–1. Therefore, both corpora contain the same speech situations (dialogue, mainly everyday face-to-face conversation, as well as monologue), but the specific recordings added to LLC–2 also reflect the technological advances of the last few decades, particularly with respect to speech situations such as telephone calls (e.g., Skype) and broadcast discussions and interviews (e.g., podcasts). Moreover, the transcriptions in LLC–2 are orthographic and time-aligned with the corresponding sound files, which is a feature of the corpus that is novel and makes it possible to, among other things, investigate prosody and dialogue management among speakers with great precision. The corpus, as well as metadata about the transcriptions and the speakers, will be released to the public in late 2019 from the Lund University Humanities Lab's corpus server. The release will fill an unfortunate gap in the availability of spoken corpora for linguistic analysis. The benefits of spoken corpora in general and of LLC–2 in particular will be demonstrated in the talk through examples of case studies based on the corpus (e.g., Põldvere & Paradis, 2019a, 2019b). The case studies illustrate how LLC–2 can contribute to our understanding of meaning-making and discursive practices in real communication and provide a window into the cognitive and social processes of dialogic interaction, both from a contemporary and a back-in-time perspective. References Põldvere, N., & Paradis, C. (2019a). 'What and then a little robot brings it to you?' The reactive what-x construction in spoken dialogue. English Language and Linguistics. Advance online publication. doi:10.1017/S1360674319000091Põldvere, N., & Paradis, C. (2019b). Motivations and mechanisms for the development of the reactive what-x construction in spoken dialogue. Journal of Pragmatics, 143, 65–84.
Bas Aarts, University College London

Research in spoken English: the London–Lund experience
In this paper I will go back to the beginning by tracing the history of the collaboration between the Survey of English Usage (which celebrates its 60th anniversary this year) and the University of Lund. I will briefly present the corpus exploration tools that we developed, and how they can be used to carry out research on both written and spoken English. I will then present the results of some recent research in the Survey of English Usage on spoken English, specifically work on the progressive construction, modal verbs and the perfect construction.Gunnel Tottie, University of Zurich

Corpus linguistics now and then: the case of negation of indefinites
I will illustrate the progress (and the woes) of corpus linguistics with examples from my own work on two problems in the syntax of negation in English. I first studied the variation between NO-negation and NOT-negation, as in (1) and (2), beginning with the Brown Corpus in the 1970s and adding the London–Lund Corpus in the 1980s (Tottie 1983, 1991). (1) NO-negation: I have no dog/money/friends.(2) NOT-negation with a or any: I don't have a dog/any problem/money/friends. The second problem was impossible to address because of the paucity of material in the seventies and eighties: the variation between the indefinite article a/an and any as indefinite determiners of count nouns in sentences with NOT-negation, as in (3): (3) It isn't a/any problem/There isn't a/any problem/I don't have/see a/any problem. With the advent of mega-corpora like the Corpus of Contemporary American English (COCA), comprising 577 million words, it was tempting to try to study the variation between a/n - or any-negation and try to find the factors conditioning their use. This is what I am currently working on – not without complications, due both to the sheer size and the makeup of the corpus. References Tottie, Gunnel. 1983. Much about Not and Nothing. A Study of Analytic and Synthetic Negation in Contemporary American English., (Publications of the Royal Society of Letters at Lund 1983–1984:1. Lund: Kungl. Humanistiska Vetenskapssamfundet.)
Tottie, Gunnel. 1991. Negation in English Speech and Writing. San Diego, New York, London: Academic Press.
Karin Aijmer, University of Gothenburg

'They're like proper crazy like' – New uses of intensifiers in spoken English
New intensifiers emerge, become fashionable but can then lose their popularity and be replaced by other more striking intensifiers. When Paradis (2000) revisited degree modifiers of adjectives in the 1990's, she found, for example, that the intensifiers in the London Lund Corpus occurred with different frequency in the COLT corpus (the Bergen Corpus of London Teenagers) that she used for comparison. Moreover, she found some examples of 'new' intensifiers such as well (well weird) and enough (enough bad). The aim of my presentation is to discuss changes in the intensification system which have taken place after this. The focus will be on some intensifiers (eg well, all, proper, pretty) which seem to have become more frequent or have emerged recently in spoken British English. The material for this study is taken from the spoken British National Corpus 2014 (Love et al 2017). On-going changes can be observed by comparing the distribution and uses of the same intensifiers in the old BNC (BNC1994). The research questions are: – What are the mechanisms responsible for the changes?– What is the role of sociolinguistic factors such as the age and gender of the speakers to explain the changes? References Love, R., Dembry, C., Hardie, A., Brezina, V., and McEnery, T. 2017. The Spoken BNC2014 – designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics 22 (3): 311–318.
Paradis, C. 2000. It's well weird. Degree modifiers of adjectives revisited: The nineties. In Kirk, J. (ed.). Corpus galore. Analysis and techniques in describing English. Amsterdam: Rodopi. 147–160.
Robbie Love, University of Leeds

Building and analysing a national corpus of informal spoken English: the Spoken BNC2014
The Spoken BNC2014 (Love et al. 2017, Love forth.) is an important component of the new British National Corpus 2014;i a large dataset representing current British English usage across different situations, which is being compiled by Lancaster University in collaboration with Cambridge University Press. It is the successor to the spoken component of the original British National Corpus (Crowdy 1995) and was released publicly via Lancaster University's CQPweb server (Hardie 2012) in September 2017. In terms of corpus construction, I pay attention to other contemporary spoken corpus projects such as the London–Lund Corpus 2 (Paradis et al. 2015–), the spoken component of CorCenCC (Knight et al. 2016) and FOLK (Schmidt 2016) and consider the role of representativeness in corpus design. I argue that representativeness is an ideal but that it is inevitable – due to practical constraints – that there will be some differences between the original design of a large 'national' corpus and the finished product, and that it is important to be honest, critical and realistic about representativeness. I then demonstrate the research potential of the Spoken BNC2014 with examples from recent research into adverbs (Goodman & Love 2019). i http://cass.lancs.ac.uk/bnc2014/References Crowdy, S. (1995). The BNC spoken corpus. In G. Leech, G. Myers, & J. Thomas (Eds.), Spoken English on Computer: Transcription, Mark-Up and Annotation (pp. 224–234). Harlow: Longman.Goodman, O., & Love, R. (2019). 1000 hours of conversations: what does it mean for ELT? 53rd Annual IATEFL Conference & Exhibition. Liverpool, UK. April 2019.
Hardie, A. (2012). CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics 17(3), 380–409.
Knight, D., Neale, S., Watkins, G., Spasic, I., Morris, S., & Fitzpatrick, T. (2016, June). Crowdsourcing corpus construction: contextualizing plans for CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes – The National Corpus of Contemporary Welsh). Paper presented at the IVACS 2016 conference, Bath Spa University, UK.
Love, R. (forth). Overcoming Challenges in Corpus Construction: The Spoken British National Corpus 2014. New York: Routledge.
Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics, 22(3), 319–344.
Paradis, C., Põldvere, N., Johansson, V., & O'Hare, P. (2015–). The London–Lund Corpus 2 of spoken British English (LLC–2). Available at: http://www.sol.lu.se/en/research/forskningsprojekt/906/ (last accessed June 2019).
Schmidt, T. (2016). Good practices in the compilation of FOLK, the Research and Teaching Corpus of Spoken German. International Journal of Corpus Linguistics, 21(3), 396–418.
Susan Reichelt, University of Greifswald

Combining apparent and real time approaches to language change: Recent developments of kind of and sort of in spoken British English using BNClab
This study reports on ongoing changes in the use of hedges sort of and kind of in spoken British English of the past twenty years. Following known sociolinguistic patterns of change in progress (c.f. Bailey, 2008; Pichler et al., 2018), special focus will be put on three categories of time: age, date of birth, and date of corpus compilation. The data used in this study stem from two subsets of the original BNC from 1994 and the newly compiled BNC2014. Both sets were, where possible, balanced across social categories of age, gender, location, and social class. Feature tokens were extracted using the online platform BNClab, which includes a concordance viewer alongside first data evaluations, visualizations, and teaching materials. The design of the subsets, highlighted further in this presentation, allows for a combination approach to change, using apparent time and real time trend analyses. The features under investigation, hedges sort of and kind of, are often treated as having "basically the same meaning" (Mauranen, 2004: 179; see also Aijmer, 1984: 118), yet show distributional differences across different varieties of English. In British English context, sort of is often found as more dominant (cf. Aijmer, 1984; Biber et al., 1999; Gries & David, 2007; Kay, 1984; Mauranen, 2004). Feature use within the two BNC datasets suggests that the variants are currently undergoing change. Kind of is increasingly encroaching on sort of – a change that becomes observable through the inclusion of the three categories of time, as mentioned above. The talk thus highlights the need and usefulness of corpora that allow the researcher to combine apparent and real time approaches in order to gain a full picture of ongoing linguistic change. ReferencesAijmer K. (1984). 'Sort of' and 'Kind of' in English conversation. Studica Linguistica 38: 118–128.
Bailey, G. 2008. Real and Apparent Time. In: Chambers, J.K. et al. eds. The Handbook of Language Variation and Change. Blackwell Publishing Ltd, pp. 312–332.
Biber D, Johansson S, Leech G, et al. (1999). Longman Grammar of Spoken and Written English, London: Pearson Education Limited.
Gries S and David C. (2007). This is kind of / sort of interesting: variation in hedging in English. In: Päivi Pahta IT, Terttu Nevalainen & Jukka Tyrkkö (ed) Studies in Variation, Contacts and Change in English: Towards Multimedia in Corpus Studies. Helsinki: Research Unit for Variation, Contacts and Change in English (VARIENG).
Kay P. (1984). The Kind of/Sort of Construction. Tenth Annual Meeting of the Berkeley Linguistics Society. 157–171.
Mauranen, A., (2004). They're a little bit different. Observations on hedges in academic talk. In Aijmer, K. & Stenström, A., (eds.). Discourse patterns in spoken and written corpora, pp. 173–98.
Pichler, H, Wagner SE, Hesson A. 2018. Old-age language variation and change: Confronting variationist ageism. Lang Linguist Compass. 12:e12281.
Jonathan Culpeper, Lancaster University

On 'spokenness': From Early Modern to Present-day English
This paper reflects on 'spokenness' in English from the early modern period to today. I begin by (a) making some general remarks on spokenness in the history of English, and (b) introducing a descriptive approach to 'writenness' and 'spokenness', one revolving around three categories, namely, the degree to which a text is speech-like, speech-based or speech-purposed. This approach was part of the corpus-based work on spoken interaction in historical English writing that I conducted over 20 years with Merja Kytö (e.g. Culpeper and Kytö 2010). I discuss some of the problems we encountered and some of our findings, in particular those relating to what we termed 'pragmatic noise' (essentially, primary interjections, the noises – ooh's and aah's – that carry pragmatic meanings). I identify the five pragmatic noise items that occurred most frequently in all our speech-related genres but hardly occurred in our non-speech-related genres, and also briefly account for their development. In addition, I discuss at some length the case of the genre of play-texts, a complex hybrid spoken-written genre. Using corpus-based methods, I show how it has changed over the centuries, and relate some of those changes to changes in context. References Culpeper, Jonathan and Merja Kytö (2010) Early Modern English Dialogues: Spoken Interaction in Writing. Cambridge: Cambridge University Press.Herbert Clark, Stanford University

