Bodong Chen

Crisscross Landscapes

Notes: Meara - 2001 - P-Lex: A Simple and Effective Way of Describing the lexical Characteristics of Short L2 Tests.



Citekey: @meara2001p

Meara P and Bell H (2001). “P-Lex: A Simple and Effective Way of Describing the lexical Characteristics of Short L2 Tests.” Prospect, 16(3), pp. 5-19.


This article introduces a novel way to assess productive vocabulary – P-LeX – which is better at handling shorter text (in comparison with LFP) and easier to interpret.


ABSTRACT This paper describes an alternative approach to assessing the lexical complexity of short texts produced by second language learners of English. This methodology bears a passing resemblance to the Lexical Frequency Profile (LFP) suggested by Laufer and Nation (1995), but the approach it takes is mathematically more sophisticated, and the data it produces is easier to work with. We argue that P_Lex produces data which is broadly comparable with the data produced by LFP. However, P_Lex works much better with shorter texts than LFP does, and this makes it a better tool for evaluating texts produced by low-level learners. (p. 5)

The most common approach to this problem has been to work with measures of lexical richness, or lexical diversity. Indices of this sort have been widely used in studies of lexico-statistics (eg Herdan 1960) and L1 development (eg Miller and Klee 1995), and they have begun to appear with increasing frequency in work on L2 speakers (eg Arnaud 1984, Broeder et al 1988, Malvern and Richards 1997). (p. 5)

Most of this work uses measures that compare the number (p. 5)

of Lexical Types in a text with the number of Lexical Tokens in the same text. A number of measures of this sort exist (see Table 1), but there is no clear agreement about which is the best variant to use in the context of L2 learners. (p. 6)

The main practical difficulty with measures based on Types and Tokens is that they are sensitive to the length of the text being assessed (longer texts typically have lower Type/Token Ratios than shorter ones do) (p. 6)

The measures listed in Table 1 are all examples of what we might call Intrinsic Measures of Lexical Variety. In these measures, variety is assessed solely in terms of the words that appear in the text itself. (p. 6)

This suggests that there might be a case for developing some Extrinsic Measures of Lexical Richness for use with L2 learners. These measures would not be limited to the number of Types or Tokens appearing in an L2 text: they would supplement this information with additional information about the sorts of words being used, and the sorts of lexical choices that are being made in a particular text. (p. 7)

An example of a measure of this sort is to be found in Laufer and Nation’s Lexical Frequency Profile (Laufer and Nation 1995). (p. 7)

In our experience, LFP has poor measurement characteristics and does not discriminate well between texts, because it relies very heavily on a simple count of the number of Category Three and Category Four words in the text. The number of these words in a ‘typical’ text is usually very small, and this severely limits the way LFP works. (p. 8)

More importantly, a serious practical problem with LFP is that it requires relatively long texts for stable measures to emerge. Laufer and Nation claim that in their data ‘profiles over 200 words were found to be stable, while those done on less than 200 words were not’ (1995: 314). (p. 9)

P_Lex (p. 9)

P_Lex is based on the idea that it might be possible to make a virtue out of the fact that ‘difficult’ words occur only infrequently in texts. P_Lex looks at the distribution of difficult words in a text, and returns a simple index that tells us how likely the occurrence of these words is. The underlying assumption here is that people with big vocabularies are more likely to use infrequent words than people with smaller vocabularies are, and that we can use the index we derive from the texts as a pointer to vocabulary size. (p. 9)

Not surprisingly, it turns out that the distributions we get for this type of analysis are strongly skewed to the left: most texts contain few difficult words, and texts that contain a very high proportion of such words are themselves quite unusual. Distributions that are strongly skewed to the left are often (p. 10)

Poisson distributions (p. 11)

The mathematics of fitting curves to data is fairly complex, so we have summarised this process in detail in Appendix A. For readers who don’t want to get that involved, it is enough to know that there is a procedure which makes it possible to turn data like that in Table 3 into a single figure, conventionally known as lambda. (p. 11)

Lambda values typically range from 0 to about 4.5, with higher figures corresponding to a higher proportion of infrequent words. Lambda values have good measurement characteristics, and this allows them to be added and averaged straightforwardly. (p. 11)

More importantly, however, lambda scores are much less sensitive to text length than the LFP scores are, and, critically, the P_Lex methodology gives lambda scores that are reasonably stable with very short texts. (p. 11)

An evaluation of P_Lex This section illustrates the way P_Lex works with a large set of texts produced by L2 learners of English. (p. 11)

Our basic question was whether the P_Lex methodology is reliably stable across administrations (p. 12)

We also examined whether the P_Lex measure was able to distinguish reliably between groups of learners at different levels of proficiency. (p. 13)

However, it is clear from Figure 2 that P_Lex is essentially stable from about 120 words, and that texts of this length clearly discriminate between the proficiency levels (p. 13)

illustrated. (p. 14)

The data in this figure suggest that it might be possible to get reliable P_Lex data from texts that were considerably shorter than the texts analysed in the main experiment. (p. 14)

Discussion (p. 15)

The data reported above suggest that the P_Lex methodology is basically a reliable one, which produces data very similar to the data produced by LFP. However, P_Lex has the advantage that it seems to work with much shorter texts than the recommended minimum test length for LFP, and this makes it a more useful tool for analysing the output of L2 learners, particularly lowerlevel learners. (p. 15)

The question of validity is much more awkward to deal with. There are two basic problems here. The first problem is that there are no other tests of productive vocabulary with which we can compare these data. (p. 15)

Our approach here has been to use the so-called Productive Version of the Levels Test as a comparison point, but there are a number of reasons for viewing the Levels Test as a poor instrument (p. 15)

The second problem concerns our selection of ‘difficult’ words. In the work reported here, we have defined ‘difficult’ in terms of frequency, a practice that is largely unquestioned in this field. The version of P_Lex used here used Nation’s (1984) word lists as a way of discriminating between ‘easy’ words and ‘hard’ words. We arbitrarily assigned words in Nation’s 1000 word list to the former category, along with proper nouns, numerals and geographical derivatives, while any other words were assigned to the latter category. We think, however, that there might be a case for exploring alternative ways of characterising vocabulary. (p. 15)

Specifically, we think that ‘difficult’ vocabulary is not entirely to be defined in terms of frequency: words are unusual in particular contexts for particular groups of L1 speakers, and it may not be possible to draw up a list of ‘difficult’ (p. 15)

Other ‘difficult’ words would then actually indicate a lack of appropriate vocabulary. (p. 16)

Appendix A: How the lambda scores are calculated

The advantage of fitting Poisson curves to our data is that these curves are conveniently described by a compact formula:

PN =(λN *e-λ )/N!

The critical value in this formula is the variable λ (lambda), which defines the overall shape of the curve. If we know the value of lambda, then we know what the curve will look like, and this means that we can use lambda as a short- hand for describing data like the sample presented in Table 3.

Inevitably, these curves are not exact fits, and P_Lex reports an Error Figure, which shows how well the data are described by the best-fitting Poisson curve.