How can corpora help editors and proofreaders?

How often have you needed another word for a common term or phrase to avoid repetition? You can turn to a thesaurus, but there is a much more comprehensive source of inspiration accessible online. Ana Frankenberg-Garcia explains.

To make texts accurate and readable, we are required to evaluate other people’s words and wordings. However, people express themselves in different ways, and it is not always straightforward to tell whether documents need to be changed or how they can be improved. This is especially true when the subject matter, terminology or style of the text at hand is not entirely familiar. Dictionaries, glossaries, style guides and online searches can help, but not always. That is when we turn to more experienced colleagues. But what if they too don’t know the answer? What if they give us conflicting responses? What if it is late at night and we have an early morning deadline? Don’t worry, a corpus can help, and can often help more than any other source you have used before.

What is a corpus?

A corpus is a collection of authentic, machine-readable texts sampled to be representative of the language or language variety we wish to focus on. For example, a corpus consisting of a large number of business letters written by business people going about their normal routine can help us observe how words are objectively used in business correspondence.

How can corpora help?

Imagine you are not sure whether a business email should end in I look forward to hearing from you or I am looking forward to hearing from you. A corpus such as Professor Yasumasa Someya’s free Business Letter Corpus, with one million words of UK and US business letters, will do the trick. Compare the search results for looking forward and look forward.

First, you can see that look forward, with 997 occurrences, is more conventional in business letters than looking forward, with only 161 hits. Note that this is just in UK and US business letters, not the entire internet, so you know exactly where your results are coming from. Next, you can see that corpus software aligns the expression searched in the centre of your screen, which means you just need to scroll down to inspect every single occurrence of it. Reading ‘vertically’ makes finding out how words are used in context much faster and easier than reading linearly, as we normally do. And indeed, if you observe the context of how these wordings are employed, you will notice that looking forward tends to occur in more informal circumstances (eg fun night, great show, long chat), whereas look forward is used more formally (eg favourable reply, challenging career, future opportunity).

Another thing that corpus software does is help you to find out, in seconds, how words are used together.

Imagine you have a blank and can’t think of a verb to go with opinion. If you run a search for opinion in the enTenTen corpus (with 38 billion words of current English), you will not only be able to scroll down results like the ones shown above, where you can spot verbs like give, sway and form, but you can also carry out a further search step where the software automatically counts, ranks and sorts all the words that occur, say, four words to the left of opinion. This will generate a list of words frequently co-occurring with opinion, which you can scroll down and notice verbs like express, voice and share (see right).

Or, even better, you can sort this list to zoom in on just the verbs that occur in the context of opinion (see far right). There is no space here for more examples, but there are countless other ways in which corpora can help editors and proofreaders.

How can editors and proofreaders access corpora?

Until a few years ago corpora were only accessible to researchers, but nowadays anyone with access to the internet can consult one. A good place to start is the no-frills, free, online SkELL (Sketch Engine for Language Learning) corpus. The British National Corpus and the Corpus of Contemporary American English can also be accessed free of charge. If you want more English corpora, and corpora in many different languages, the incredibly powerful Sketch Engine tool used by big dictionary makers is available for a modest subscription fee.

Anyone who works professionally with language can benefit from corpora. Corpora are, after all, where lexicographers and linguists get the raw material they need to compile dictionaries and other language resources in the first place. Although corpora don’t provide us with black-and-white answers, they do give us access to how words are used in the real world, in ways that allow us to draw our own conclusions. Even when it is late at night and we have an early morning deadline!


This article originally appeared in the March/April 2018 issue of Editing Matters. CIEP members can access the Editing Matters archive.


About Ana Frankenberg-Garcia

Ana Frankenberg-Garcia is the programme leader of the MA in Translation, University of Surrey. Her research focuses on applied uses of corpora in translation, lexicography and language learning.

 

 

About the CIEP

The Chartered Institute of Editing and Proofreading (CIEP) is a non-profit body promoting excellence in English language editing. We set and demonstrate editorial standards, and we are a community, training hub and support network for editorial professionals – the people who work to make text accurate, clear and fit for purpose.

Find out more about:

 

Photo credit: letters by Brett Jordan on Unsplash.

Posted by Abi Saffrey, CIEP blog coordinator.

The views expressed here do not necessarily reflect those of the CIEP.

 

 

Leave a Reply

Your email address will not be published.