home > publications > orta International Test Commission

Table of contents

Personality
  Summary
  Introduction
  Personality defined
  Structure of Personality: the Five Factor Model (FFM)
   Five basic personality factors
   A human universal?
   One factor, several underlying facets
   Are there only Five factors?
   Don’t mistake an inference for an observation
  How to assess personality
   Personality as the common perspective of others
   Why use self appraisal then?
   Honest responding
   Other methods of appraisal
  Personality testing: some psychometric issues
   Observable data, latent traits
   Modern test models: Item Response Theory (IRT)
  Predictive power and utility
   Chain of inference and predictive utility
   Enhancing validity: subject matter experts
  Some current issues
   Intercultural exchangeability
   Importance of situations: broad versus small predictors
   Personality and mental ability
  Conclusion
  References
  Questions for discussion
  About the author: Nico Smid

Personality

Author: Nico Smid

Reviewed by Cheryl Foxcroft, January 2011

When citing this reading, please reference it as follows: Smid, N. Personality. In M. Born, C.D. Foxcroft & R. Butter (Eds.), Online Readings in Testing and Assessment, International Test Commission, http://www.intestcom.org/Publications/ORTA.php

Please click here for full text (PDF).

 

Summary

Personality is described in this reading from a psychometric perspective. It is defined here as referring to the publicly observable regularities in behaviours that are both lasting and typical for individuals and by which we tell individuals apart. The Five Factor Model of personality (FFM) is put forward as the integrative framework for personality assessment. Personality questionnaires are described as the primary assessment tools. Methodical problems like differences between self appraisals and appraisals by others as well as ways to counter dishonest responding are discussed. The growing importance of modern test theory in the form of Item Response Theory (IRT) is stressed. Finally, the predictive power of personality for real life decisions is discussed as well as the limits put on this by culture, situations and self-monitoring ability.

 

Introduction

Sometimes, when giving a lecture or a workshop on personality, I start by asking the audience to give a definition of ‘Personality’. After a lot of hesitations and “uh, uh….”, quite a number of different descriptions follow. Often these are vague descriptions like “character” or “what you are like”, which are quite different from each other. This generally amounts to a dramatic demonstration of the ease with which people use an everyday concept without ever checking whether they have a clear idea of it or whether they agree with others regarding its content. This of course then creates a fruitful basis for a willingness on the part of the audience to discuss the concept in depth.

With this anecdotal experience in mind I will start by defining what will be covered in this reading on personality and what will not.

First and foremost, I will restrict myself to ‘normal personality’. Of course, a lot of personality research and application is done in so-called clinical contexts. That is to say, working with individuals who have serious problems in adapting to their environment or who are seen by others as being a serious problem in order to understand their characteristic feelings and behaviours and to change them for the better.

Normal personality, however, concerns itself with the normal variations in behaviour we see between individuals who have no serious adaptive problems. The word ‘variations’ in the last sentence adds a second restriction to what will be covered. I will focus on ‘individual differences’. That is to say, I will restrict myself to a so-called psychometric perspective on personality. After all, this is a contribution to a website of the International Test Commission (ITC). So, personality testing is the primary focus. Testing should be understood here, however, in a broad sense. It refers to any procedure that tries to assess individual differences in observable behaviour in a reliable way in order to predict a relevant external criterion.

The following aspects will be covered:

-        Definition of personality

-        Structure of personality

-        Assessing personality: models and specific issues

-        Practical use in prediction

-        Current issues in measurement and application.

 

Personality defined

In discussions following the above mentioned question to define ‘personality’, consensus is quickly reached that the prime focus needs to be on ‘observable behaviour’. Of course, people have feelings; they have intentions and are more or less motivated to do things. But you cannot observe such feelings, intentions or motivations directly. You have to infer them either from believing direct self-reports like “I feel…”, “I want to …”, or from observing things people do, their behaviours.

Therefore, the domain of personality-relevant information is restricted here to publicly observable behaviour.

A second restriction concerns what aspects of behaviour are attended to. An important aspect of the personality concept is that it refers to regularities in behaviour which are typical and lasting for an individual. That is to say, behaviours that set one individual apart as different from others in the eyes of external observers, not only now but also tomorrow and the day after. And not only that, but also behaviours which are important to take into account in our daily interactions with each other. So-called socially-relevant behaviours. Behaviours other people find worthwhile to attend to and to react upon.

Already more than half a century ago, the American psychologist Raymond Cattell referred to the foregoing as the ‘sedimentation’ hypothesis: each typical behavioural aspect of an individual that is socially-relevant has its ‘sediment’ in everyday language in the form of a personality descriptive adjective like ‘talkative’, ‘orderly’, and so on. Thus, the domain of observable personality data should coincide with the total set of personality descriptive adjectives in everyday language.

The question then arises if it is possible to reliably define this set of adjectives. And the answer is affirmative: yes, it is. It is straightforward to do this, and has therefore been extensively done, not only in the American language and culture but also in other languages and cultures all around the globe. It works like this. From a standard dictionary, identify all the adjectives that can possibly be applied to an individual, and which refer to observable behaviour. Then the next critical step follows. Ask a relevant sample of individuals from the language group from which the adjectives were identified to evaluate each adjective to determine whether it fits in the following sentence: “(s)he is by nature” or “(s)he is a person”. Though the latter sentence often lends itself to more adjectives being included than when the former sentence is used, what is most remarkable is that different people show high agreement among themselves whether an adjective fits or not. Consequently, this methodology makes it possible to end up with a clearly defined set of relevant personality differences represented in a neat list of 1000 to 2000 personality descriptive adjectives, depending on the language you have used. Subsequent research has shown that such a list, when carefully compiled, comprehensively defines the personality domain. And what is more, the basic structure of this domain is far more common than differences between languages or cultures. This is what will be discussed next.

Structure of Personality: the Five Factor Model (FFM)

 

Five basic personality factors

Having compiled a list of say some 1200 personality descriptive adjectives (which is about the number of such adjectives in the Dutch language), structuring the overlap is the next step. Not all adjectives are unique. ‘Agreeable’ and ‘pleasant’, for instance, convey almost the same information about a person. Factor analysis or meaning similarity analyses are among the techniques that can be used to reduce the total set of adjectives into a number of independent basic factors.

To this end, large samples of participants can be requested to rate themselves on the total list. Furthermore, other participants can be asked to rate people that they know well. The resulting data – both the self ratings and the ratings of others – can be reduced via factor analysis to a stable set of main factors. A further technique is to ask participants to directly compare pairs of adjectives in terms of the extent to which they are similar in meaning. The data from these comparisons are then analyzed and summarized into underlying common meaning scales.

The results of both kinds of analyses – factor analyses and meaning similarity –  have turned out to be so consistently similar worldwide that the resultant model has gained the status of a commonly accepted overall personality model in both the scientific and the applied practical community. It is called the Five Factor Model, FFM for short (McCrae & Costa, 2003).

These five factors might sometimes have somewhat different names but they all convey similar meanings. Here, the following labels (including a short explanation) will be used:

-        Need for Stability (N)
Differences in the extent to which people react emotionally to setbacks

-        Extraversion (E)
Differences in the extent to which people actively maintain contact with others

-        Openness (O)
Differences in the extent to which people look for new experiences and new ideas

-        Accommodation (A)
Differences in the extent to which people place other people’s interests above their own

-        Conscientiousness (C)
Differences in the extent to which people behave in an organized and purposeful manner.

 

A human universal?

The above is of course a very general and highly aggregated level of description. But it is at the same time the core that has generally been found for the human species. This should not come as a surprise. Well founded evolutionary theory asserts that humans are quite a homogeneous species not only in genetic makeup but also in the environment in which they until very recently have evolved: hunters and gatherers in small groups in a deprived and often hostile environment. Differences in the FFM might be expected then to account for differences in survival value, and might thus lead to the evolution of stable individual differences that matter both socially and environmentally. To paraphrase a little (as well as giving a different rank order from the list above, distinguishing between ‘social’ and ‘environmental’ control), someone’s standing on each of the five factors of the FFM is informative to the extent to which one may solve an important survival problem, such as:

Social control:

-        Accommodation: “To what extent may I trust others or should I be guarded regarding their intentions?”

-        Extraversion: “How well can I continue to take the initiative in my contacts with other people, or would it be better to wait for them to take the first step?”

Environmental control:

-        Conscientiousness: “How well do I control my external environment, or does the environment control me?”

-        Need for stability: “How well do I stay in control when unexpected setbacks happen?”

-        Openness: “To what extent should I gather new information to stay in control?”

Indeed, genetic and evolutionary psychologists (e.g., Buss, 2005) have presented considerable evidence regarding the genetic as well as the evolutionary base of the FFM. Other scholars have shown that it is highly common to derive a FFM structure from answers to personality questionnaires based on it as, for instance, the NEO (Costa & McCrae, 2003).

One factor, several underlying facets

As far as the above mentioned high level of aggregation is concerned, in practical applications, each of the FFM factors often split up into homogeneous sub-factors, mostly called ‘facets’. This is the for instance the case with the above-mentioned NEO questionnaire. As a concrete example from another questionnaire, the Reflector Big Five Personality (RBFP) (Schakel, Smid, & Jaganjac, 2007), the factor Need for stability is subdivided into the following homogeneous facets:

-        Sensitiveness: the extent to which people worry about themselves.

-        Intensity: the ease with which people get angry

-        Interpretation: the extent to which people emphasize problems above solutions

-        Rebound time: how much time people need to rebound from setbacks

-        Reticence: the extent to which people feel uneasy in a group.

And so on for each of the five factors within the FFM.

The facets within a single factor hang together quite well – individuals who score high one facet have a more than average chance to also score high on another from the set – but far from perfectly so. Therefore, a more fine-grained description may be provided of differences between people who have approximately the same position on the overall factor of, say, Need for stability, but still show a different pattern on the underlying facets. Taking such differences in facets into account should enable a better prediction of, for instance, performance in jobs. And empirical evidence suggests that this in fact is the case (Schakel & Smid, 2005).

 

Are there only Five factors?

Another issue around the FFM concerns the question: are there only five factors or are there more? More specifically, do five factors really span the whole personality domain or are more factors needed to give a comprehensive account of all relevant personality differences between people? From several lines of research more factors are proposed, often one extra but sometimes also two or three which then adds up to a seven or an eight factor model.

The first thing that can be said on this issue is that unanimity is absent on what the additional factors are. This in contrast to the FFM on which broad unanimity exists, even among the proponents of some extra factors who see these generally as additional to the FFM, which they recognize as providing a common base.

Zooming in on the research and arguments of the proponents of more than five factors three lines of research may be distinguished.

The first concerns the basic data underlying their analyses. Remember that the FFM was originally based on the analysis of personality descriptive adjectives. Some researchers (e.g., De Raad & Barelds, 2008) state that people not only describe other persons’ personality with a list of adjectives like, for instance, ‘bossy’ or ‘aggressive’ but also often describe them with a noun like ‘a bully’. Sampling such nouns from the dictionary over and above adjectives and then adding them to the analysis may lead to extra factors. It is, however, difficult to establish to what extent such factors depend on the grammatical and semantic differences between adjectives and nouns. After all, one may interpret a noun as a description of a type of person in the way that it represents a summary of a number of adjectives. Much as ‘a bully’ may be interpreted as a summary of adjectives like ‘bossy’ and ‘aggressive’.

Nonetheless, research continues in this area but consensus has not nearly been reached yet. For the time being, therefore, the FFM stands out as most empirically defensible common personality framework.

A second line of research concerns the type of adjectives that are entered into the analysis. Remember we purposively restricted ourselves to observable behaviour. However, what if adjectives that describe moral intentions or moral evaluations are also added? Consider for instance an adjective like ‘benevolent’. It fits in a sentence like “(s)he is a benevolent person”, but such ‘moral intention’ adjectives were excluded beforehand because they cannot be observed and can only be inferred. If such adjectives are included in the analysis, a sixth factor generally emerges. This factor is commonly referred to by terms like ‘integrity’ or ‘sincerity’ (Ashton et al., 2004). This finding does not mean, however, that the FFM is shown to be incorrect but only that apart from what people objectively observe, they separately also ascribe good or bad moral intentions to others. This is fruitful knowledge in its own right but it refers primarily to the way the observer arrives at a moral judgment and not to a direct description of observable behaviour.

The third line of research is akin to the latter. Here, the structure of personality in non-western cultures is investigated. Also here, especially in Asian cultures, a sixth factor – ‘integrity’ or ‘other-directedness’ has been established (Cheung et al., 2003). Again, this factor might be derived on a similar basis to the second line of research discussed above. Within the boundaries of a culture, all statements are sampled that might be attributed to people by members of the culture, without however explicitly distinguishing by design between observable behaviours and ones that can only be inferred, as was done when constructing the FFM. What is clear from this research is that in the Asian – and other, for instance, African (Meiring et al., 2005) – cultures moral attributions are important aspects within social relations. But these are also partially grounded on observations of behaviour that form a separate domain in their own right.

 

Don’t mistake an inference for an observation

It may be concluded then from the above that it is a reasonable strategy to restrict the data for investigating personality to the set of observable and stable personality adjectives. Under such a restriction, the FFM stands out as the worldwide commonly applicable core of general personality factors. At the same time, it is true that people ascribe all sorts of moral intentions to other people, partly based on observations and partly on their own  moral ideas, which may be empirically summarized by terms like ‘honest’ or ‘sincere’. This is not a personality description, though, but an observer inference. The FFM, however, is about summarizing observations of behaviour and not about interpreting them.

 

 

How to assess personality

Personality as the common perspective of others

Having defined the domain of data that is used to study personality, the next question that needs to be addressed is how to assess the personality of individuals. Personality testing is the core business of the psychometric perspective on personality.

Remember that the basic domain refers to observations of behavioural regularities. Therefore, personality enacts itself in the public domain. The most straightforward way to assess someone’s personality is then of course the average appraisal made by all knowledgeable persons. How should one interpret ‘knowledgeable’?

When the objective is to summarize as accurately as possible the typical and lasting aspects of an individual’s behaviour – also often called: the ‘gist’ of behavior – who then is in the best position to do so? In principle the individual him- or her-self, since (s)he observes most of it. At the same time we know that our evaluation of ourselves is subject to a number of biases, not the least of which is the tendency to see ourselves in a more favourable light than others see us.

Moreover, there is only one of me. This makes my self-appraisal quite unreliable. The reliability of a single self-appraisal on a single scale will generally not transcend the .30 barrier, which heavily reduces its usefulness as a predictor of any external criterion.

Now the same is true of a single external observer. So, when using external observers to assess someone’s personality, one should use the common appraisal of at least five to eight observers in order to arrive at an acceptable level of reliability of the average appraisal. There is of course a second reason to do so. Each observer generally sees only a specific aspect of another’s typical behaviour (e.g., what a boss sees could be different from what a colleague sees).  Therefore differences in the perspectives of observers should be averaged out to arrive at a common appraisal across all observers.

The use of averaged appraisals by other people to assess the personality of a specific individual is seldom advocated, however. Within the perspective put forward here, cost limitations and practical difficulties to gather such appraisals are among the reasons for this. However, some people claim that you should not use appraisals by others at all to assess the personality of an individual. The individual him-/her-self is said to have specific knowledge about his/her own behaviour which cannot be appraised by other people. Within the present perspective this can only refer to ‘internal’ states, not to observable behaviour. Thus, intentions, emotions or feelings can of course only be fully appraised by the individual him- or her-self, but these are explicitly excluded in the FFM as a direct database for assessing personality as argued above.

 

Why use self appraisal then?

Most personality assessment is done by way of using personality questionnaires. These are generally self appraisal instruments instead of common appraisals by others. How can we derive valid personality inferences – that is, the ‘common perspective of others’ – from instruments that are “merely self appraisals”?

To answer this question, a closer look at the differences between self appraisal of one’s own behaviour and the appraisal by others of the same behaviour is useful. There has been done a lot of research on this topic. The conclusions from such research may be summarized as follows:

When a person responds honestly on a questionnaire that asks what (s)he generally does, his/her  answers will in general be somewhat more favourable than those of people who know him/her well and who also answer the questionnaire in a honest manner. What’s more, for each personality factor the extent of surplus favourability in a self appraisal is quite well known, and a next important finding is that most self appraisals show about the same amount of favourability bias for different individuals. This bias is therefore a quasi-constant so to speak.

Then, a methodical trick is used. Subtract the scores representing the self appraisal of an individual on a personality questionnaire scale from the average self appraisal scores of a well defined sample – this is commonly called “norming” – and you will end up with a fairly good approximation of the averaged appraisal of the above mentioned ‘knowledgeable’ others.

According to this line of reasoning, an adequately normed score on a self appraisal personality questionnaire – which is the ubiquitous method to assess personality – is not “merely a self appraisal”. It is, when responded to honestly, a good approximation of what is meant by personality within the FFM framework: the common appraisal of the gist of someone’s behaviour by knowledgeable others. Not perfectly so, of course, but as good as it gets, and as practical as generally affordable.

 

Honest responding

The above argument assumes honest responding. It is a challenge for personality assessment to ensure honest responding. The most straightforward way to do so is to take care that it is in the interest of the individual taking the questionnaire to respond honestly. This could be achieved when the questionnaire is used as information for the individual him/herself who wants to know his/her possibilities and limitations for personal development. When, however, the questionnaire is used by a prospective employer to estimate the suitability for a job, the individual may be tempted not to answer honestly but to present him/herself in a manner that fits best with the demands of the job.

A lot of research has been done to show that people can easily do this. In particular, when the answer format is a so-called ‘Likert’ scale which might look as follows:

1.      Does not apply to me at all

2.      Does generally not apply to me

3.      Sometimes applies to me and sometimes does not apply to me

4.      Generally applies to me

5.      Applies to me.

Apart from trying to convince the individual that it is in his/her best interest to answer honestly – which will not always work, of course – a technical solution is often put forward. Instead of presenting a Likert scale as given above, the person is asked to make a forced choice between statements that do not differ much in terms of favourability. For instance:

1.      I am an orderly person

2.      I am a friendly person.

To analyze the responses on such forced choices in order to assess someone’s standing on a specific personality factor, as in this example, Conscientiousness or Accommodation, has always posed quite a methodological problem. But recent analysis models have overcome these problems. Thus, forced choice personality questionnaires will be more common in the future.

 

Other methods of appraisal

Other than the use of personality questionnaires other methods have also been proposed. Common to all these alternatives are that they are much more laborious, while the value they add is not always clear-cut.

Most alternatives have in common that they avoid self appraisals but use observations by others. Two frequently used alternatives are the following:

  • First, the use of extensive interviewing. Research shows that when personality descriptions based on such interviews are used to predict external criteria like performance in jobs, they are generally no better predictors than less complicated instruments like tests for mental capacities or – relevant for the present context – well-constructed and honestly taken personality questionnaires. Of course, interviews can be made more reliable which could probably result in better predictions, but this will be very laborious and too expensive. Interviews must cover many different subjects to give a total picture, and more than one interviewer should be used, independent from one another. Broadening an interview database by extending subjects to be covered and using a number of different interviewers is, however, much more difficult to realize than making a personality questionnaire longer to make it more reliable in order to enhance its predictive power.

  • A second alternative is the direct observation of behaviour by a number of external observers. Either in real life, or in controlled environments like role playing according to a standardized script. Here, the same argument applies as for interviewing. Though the observations might be much more standardized than with interviewing, a lot of different observation situations – or, for that matter, role playing situations – are needed to get a reliable estimate of the regularities in behaviour.  For that is what personality is all about: assessing the regularities – the ‘gist’ – of a person’s behaviour, averaged across relevant time frames and relevant situations.

To this end, applied professionals mostly use personality questionnaires instead of more complicated observational procedures. Such questionnaires are by no means perfect but their utility is at least as good as that of the alternatives, when taking time and processing limitations into consideration.

 

Personality testing: some psychometric issues

 

Observable data, latent traits

Defining only observables as the basis for assessing personality is first and foremost based upon a methodical argument. It does not imply that the observable data coincides with ‘the personality’ of an individual. These observables, however, should be conceived as objective indicators from which an individual’s position on one or more personality dimensions may be derived. Such dimensions themselves are conceived to be latent. Thus, the more behaviours are observed which indicate “Need for stability”, for example, the higher an individual is supposed to be standing on the latent trait “Need for stability”.

The problem of course is how to arrive at an accurate assessment on a latent trait when one only has fallible observables. This refers to the domain of psychometrics. In analyzing personality questionnaires – as argued above, the most commonly used method of collecting personality data – classical test theory (CTT) is generally used for that purpose. The problem with CTT in this context is, however, that it is not a formal model which enables one to test whether a assumed link between the scores on the questionnaire itself and a person’s standing on a latent trait is in fact correct or not. It simply stays with the observables. CTT defines someone’s true score on such a trait as the average across repeatedly completing the questionnaire. Moreover it assumes that the unreliability of a score on a specific questionnaire is always the same, independent from someone’s position on the latent trait, be it low or high.

Both assumptions – the way a true score is defined and equal reliability for each trait position – are demonstrably false but cannot be empirically checked within CTT itself.

This is not a problem specific to personality assessment, but it is relevant in this context because it not only hinders theoretical progress but also leads to interpretational problems. More specifically, distinguishing between real personality information and distortion by so-called response biases as, for instance, dishonest responding cannot be done without ambiguity.

Modern test models have been developed to overcome such problems, among other things.

 

Modern test models: Item Response Theory (IRT)

Recent research and practice in analyzing and using personality questionnaires is based more and more on the so-called Item Response Theory (IRT). For the present purposes, IRT may be described as a set of models by which one can test formally how an individual’s observable responses on a personality questionnaire can be translated into a specific position on a latent personality trait. When the test is not falsified by a given set of data, someone’s position on the latent trait can then be directly measured.

Most importantly, at least two advantages of using IRT may be mentioned where personality assessment is concerned. First, one may actually test the correctness of different alternative models of personality against each other, whereas within CTT the questionnaire responses at hand are by definition assumed to be correct indicators. Referring to the FFM discussion above, IRT might fruitfully be used to test models that formally specify in what way more than five factors might be needed to account for all variation in personality questionnaires.

A second advantage is the possibility to properly analyze forced response questionnaires as described above in order to control for dishonest responding. Within CTT, this is hindered by formal statistical problems.

IRT will not psychometrically be described here. The reader is referred to psychometric articles elsewhere or on the present website.

 

Predictive power and utility

Chain of inference and predictive utility

From a psychometric perspective someone’s personality is not assessed merely in its own right, but as a predictor of some relevant external criterion. In contexts of personal development this might be a measure of life satisfaction or career development. In organizational contexts the criterion might be suitability for a job or the quality or quantity of performance in a job.

Common to these criteria is that they depend on a host of other things besides someone’s personality. For example, often things that are beyond the control of the individual such as economic circumstances impact on the job performance of a sales manager. The latter state of affairs inevitably results in a decrease of predictive power of personality as a predictor.

In fact we have a so-called chain of inference here. At the end of the chain is the criterion we want to predict. Let’s stay with the example: performance as a sales manager. The personal part an individual can contribute to his/her quality as a sales manager is the last but one part of the chain: his/her own behaviour. All behaviour that contributes to being a good sales manager is summarized under the label: competencies. In the case of a sales manager, a competency like ‘persuasiveness’  might be a relevant behaviour.

The first part of the chain is someone’s personality. In line with the foregoing this might be a specific profile of scores on the factors and facets of the FFM. A relevant facet here might be ‘Rebound time’ from the FFM factor ‘Need for stability’ as defined earlier.

Now, when directly predicting the third part of the chain – performance – from the first part – personality – two bridges have to be crossed. The first bridge is the extent to which ‘rebound time’ predicts the quality of the competency ‘persuasiveness’. This prediction will not be perfect of course, since ‘persuasiveness’ as a competency is a skill to be learned, for which a short rebound time as a personality trait might be of more or less help.

The second bridge is the extent to which the competency ‘persuasiveness’ is an important contributor to performance as a sales manager. This partly depends on economic circumstances, tough or easy, for instance.

Thus the predictive power of personality is always to some extent decreased by the two bridges to be crossed. In order to investigate the utility of using personality as a predictor it is a wise strategy to attend to the bridges themselves instead of looking at the direct predictive power by spanning both bridges at once.

First, one should empirically investigate which competencies are relevant contributors to the criterion. In our example: are people who are really skilled in ‘persuasiveness’ relatively more often the better sales managers?

Then the second research step follows. Which personality facets from the FFM predict the extent of effort and time needed to learn a specific competency as well as the easiness by which this competency may be skilfully executed when it is needed in practice? Again in our example the following turns out to be the case: the less rebound time one needs to recover from setbacks, the more easily and faster one learns to master the competency ‘persuasiveness’ (and a number of other competencies of course).

When only the relevant competencies are carefully selected on the one hand, and only the relevant personality facets to predict those competencies are selected on the other hand, the predictive power of a personality questionnaire – commonly called:  its ‘validity’ – may approach a value between .30 and .40, depending on what criterion is to be predicted. This may strike you as being a low figure after all, but the utility in practical contexts is nevertheless considerable. For one, the best predictor in organizational contexts, general intelligence, has a validity coefficient between .50 an .60 (Schmidt & Hunter, 1998). More important, however, is to realize what this means. A validity coefficient of .30 means that one avoids about 30 percent of the false decisions one would have made if one would have selected candidates for the job of sales manager at random, without taking the targeted personality information into account. Quite a substantive gain in utility indeed.

Enhancing validity: subject matter experts

The way in which validity should be investigated as exemplified above is of course more easily written down then actually done. As far as the second bridge – from competencies to actual performance – is concerned, directly studying the validity is a feasible research task. Let us elaborate on our example a little bit further. First, one samples evaluations of the actual skill of sales managers or future sales managers in terms of a number of competencies. Then one collects present or future sales figures. Evaluating the predictive power of the skill in those competencies might then actually show that ‘persuasiveness’ has the highest validity. So, ‘persuasiveness’ should be the competency to be predicted with the first bridge – from personality to competency. We will not dwell here on the methodical aspects of doing a professionally sound predictive validity study. Readers are referred to other readings in this regard.

Studying the first validity bridge is the bigger problem. Remember what is to be predicted here: effort and time needed to learn a specific competency. This demands a research design in which at regular intervals over many years data on actual competency skill should be collected. Then, after enough years there should be enough data to establish a valid trend. One would then correlate the speed of learning – in our example: the speed of learning ‘persuasiveness’ – with the initial scores obtained for a number of personality facets. In our example this might result in the assertion that ‘rebound time’ is quite a good predictor of the speed with which one may learn ‘persuasiveness’.

Such a study is not often done and will not often be done in the future either. There is a good alternative, however: use well-informed judges, so-called subject matter experts (SME). Select for such SME’s people who have a lot of experience in observing both skill in competencies and standing on personality traits for different individuals. Applied psychologists will generally be a good choice. Now the following not unreasonable assumption is that each of these SME’s has a non-zero validity of, say, a modest .10 or .20 in predicting speed of learning a specific competency like ‘persuasiveness’ for a person who has a high score on a personality facet like ‘rebound time’. Ask each SME to directly estimate the validity of ‘rebound time’ in predicting speed of learning ‘persuasiveness’  Then, elementary psychometrics will show that the average estimate of some five to ten SME’s is as good an estimate as may be obtained from any complicated validity study conducted over many years, as described above. This approach is probably even better and more consistent, because a study conducted over many years is confronted with many methodological problems. Of course, this is the main reason why not many such studies are undertaken in the first place.

And the SME-strategy has been shown to work. In studying the validity of a specific FFM questionnaire, the RBFP (Schakel & Smid, 2005), SME’s were used to estimate the validity of FFM facets in predicting speed of learning competencies. Then, these predictions were checked in a large database of evaluations of actual skill in a number of competencies gathered over a number of years.  The resultant validity coefficients were between .20 and .50. These are the figures one would have expected when the relevant facets to use in prediction were originally selected through a complicated research project spanning many years..

Also for personality research it is therefore a wise thing to remember that the average appraisal of well-informed SME’s often is a good alternative to laboriously collecting fallible data in a complicated research design. This is not tp say that SME input should completely replace empirical data, but this approach can provide a sufficiently accurate estimate of the validity coefficient.

 

Some current issues

To draw this reading to a close, some current issues will be briefly touched on. Recent personality research literature will cover these issues in depth. The reader is referred to this literature for more specific further information on concepts and empirical results.

 

Intercultural exchangeability

Above we already mentioned the research into additional personality factors up and above the FFM. In particular, research has been done in Asian (especially Chinese) and African (especially indigenous South African) cultures. As mentioned, interpersonal commonness and respect as well as integrity are additional factors which might be added to account more fully for what the personality construct encompasses within these cultures. At present, targeted personality questionnaires taking these additional factors into account are being developed. In these research approaches the original FFM kernel is safeguarded while adding some extra factors.

Another line of research concerns itself with the question of whether the FFM has intercultural applicability. This research has mostly been done with personality questionnaires. For example, the intercultural applicability of the NEO (McCrae & Teracciano, 2005) has been extensively investigated. In general one might conclude from this research that the FFM structure of the NEO (and other FFM based questionnaires) has a broad intercultural exchangeability, though on specific facets some reliable small average differences between cultures have been found.

How these differences should be interpreted is still open to debate. This hinges on the definition of the concept of ‘culture’. This is often loosely paraphrased as ‘the way we do things around here’. Remember that in a previous section we drew a distinction between personality questionnaire items as behavioural indicators of latent personality traits. It might well be conceivable, for example, that Japanese and European people do not differ on average on a personality facet like ‘deference’, but that they simply express it in a culturally different way. Japanese people might express it by the steepness of their bows, European people by the politeness of how they address a stranger. Personality questionnaires with such items as indicators will show differences in responses between both cultures. But this need not indicate differences on the latent trait of ‘deference’.

As also mentioned above, disentangling cultural differences in indicators and underlying identity in latent traits can only be well researched by using modern test theory, namely, Item Response Theory (IRT). This is an additional reason to expect that the use of IRT models in personality research will increase.

 

Importance of situations: broad versus small predictors

In presenting the structure of the FFM we mentioned the broad level of aggregation of the five factors. On the simple question “are you conscientious?” most people will respond “it depends”.  Depends on what? The ‘situation’.

Two aspects should be taken into account as regards this issue. First, a factor such as ‘Conscientiousness’ consists of a broad set of behaviours which comprise different facets that are only imperfectly correlated. These facets might be broadly categorized into two classes. In particular, facets which describe orderly behaviour like “I regularly clean my desk”, as well as facets describing dependable behaviours like “people can count on me”. Not all people who show the latter behaviour also show the former, though, more often than no, they are found together in one person than not. This is the reason they are subsumed under the general factor of Conscientiousness in the first place. So, even where someone’s general behavioural tendencies are concerned, it depends on the person who you observe whether (s)he will show both classes of behaviour in general.

A second aspect to attend to in this context, however, is on the individual level. Even if a person is in general both ‘orderly’ and ‘dependable’, then it still depends on the situation whether one or both of these behaviours will in reality be manifested. For this to happen the latent trait – orderly or dependable – should first be activated through demands within the situation. Without proper activation nothing is manifested. This refers to an important present research approach, the so-called trait-activation theory (Tett & Burnett, 2003).

This line of arguing surely calls for not using the broad big five factors of the FFM in predicting external criteria, but instead using much smaller and more homogeneous predictors at the level of facets. In fact, most research on FFM-based questionnaires like the NEO as well as the RBFP are based on this strategy. And this research results in higher validities than using the much broader big five factors themselves as predictors.

Personality and mental ability

There is a big difference between tests for mental ability and personality questionnaires. The former are so-called maximum performance instruments, and the latter are commonly referred to as typical performance instruments.

In mental ability tests it is known beforehand which is the correct answer, and you certainly cannot fake the correct answer if you do not actually know it. Someone’s standing on a personality trait on the other hand is generally defined as the actual relative frequency (s)he shows on a set of behaviours which are presumably indicators of that trait. This is assessed by simply asking a person whether (s)he does so. Such a question refers to typical behaviour. Of course, here one may easily fake what one does, even if this is not the case. And this occurs more frequently when a 5-point so-called Likert scale is used, as explained above.

Now, more intelligent persons – those scoring higher on mental ability tests – might more easily fake on the one hand and at the same time understand that it is not in their interest to do so. This is a really important state of affairs to take into account when trying to prevent faking in personality questionnaires. Though, as also explained above, most of the time a technical solution like the use of forced choice items is preferred. Anyhow, background and prevention of faking are presently important issues and are likely to remain so for the foreseeable future. Especially in relation to the more frequent use of IRT models.

Research shows that there is virtually no correlation between mental ability and profiles on a personality questionnaire when the latter has been honestly responded to. There is only a small correlation with the factor ‘Openness’ but the interpretation of this correlation is a matter of debate.

Yet, at the same time more intelligent persons may learn and master competencies easier, which does not fit well with their personality relative to the personalities of less intelligent ones. To take our earlier example:  even persons who in general have a long ‘rebound time’ might well learn to adequately master a competency that does not fit them like ‘persuasiveness’, simply because it might easier for them to know how to do this even when it is not their most natural behaviour. This is one of the reasons why general intelligence is often shown to be the best general predictor of performance in organizations. The smarter you are, the more you may not only capitalize on your personality but also compensate for those facets which fit less well with the demands that are made of you.

Therefore, in using personality as a predictor of external criteria the interplay with the level of general intelligence will remain a very important and fascinating research subject for time to come.

 

Conclusion

The concept of personality has been studied from a multitude of perspectives. In the present article the psychometric perspective was elaborated. The role of well-constructed personality questionnaires as the primary source to study personality was described. The Five Factor Model was put forward as the integrative framework for constructing and using such questionnaires.

It has been stressed that personality should be conceived as enacting itself in the public domain. The average appraisal of the “well informed other” should therefore be approximated by any assessment instrument. Under conditions of honest responding it can be shown that a personality questionnaire does a reasonably good job in that respect. Of course, such conditions are not easily ensured, but both paired comparison response formats and modern test theory in the form of Item Response Theory models are promising tools in this respect.

As a final comment it is worthwhile to realize that there is much more to predicting behaviour than only including personality traits. Culturally specific ways of behaving, the powerful influences of situations as well as the ability to monitor your own behaviour should also be taken into account. In the same way that we can tell different persons apart from each other, we can tell situations and cultures apart as well.


References

Ashton, M.C., Lee, K., Perugini, M., Szarota, P., De Vries, R.E., & Di Blas, L. (2004). A six-factor structure of personality-descriptive adjectives: Solutions from psycholexical studies in seven languages. Journal of Personality and Social Psychology, 86(2), 356-366.

Buss, D.M. (Ed.)(2005). The handbook of evolutionary psychology. New York: Wiley.

Cheung, F. M., Leong, F. T. L., & Ben-Porath, Y. (Guest Editors) (2003). Special Section: Psychological  Assessment in Asia. Psychological Assessment, 15, 243-310.

De Raad, B., & Barelds, D.P.H. (2008). A new taxonomy of Dutch personality traits based on a comprehensive and unrestricted list of descriptors. Journal of personality and Social Psychology, 94, 347-364.

McCrae, R.R., & Costa, P.T., Jr. (2003). Personality in adulthood: A five-factor theory perspective. New York: Guilford.

McCrae, R.R., & Terraciano, A. (2005). Personality profiles of cultures: aggregate personality traits. Journal of personality and Social Psychology, 89, 407-425.

Meiring, D., Vijver, F.J.R. van de, Rothmann, S., & Barrick, M.R. (2005). Construct, Item, and Method Bias of Cognitive and Personality Tests in South Africa. South African Journal of Industrial Psychology, 31(1), 1-8.

Schakel, L., Smid, N. & Jaganjac, A. (2007). Workplace Big Five: Professional Manual.  Utrecht: PiCompany.

Schakel. L. & Smid, N.G. (2005). Predicting career decisions through combining personality and competencies.  HRM Network Conference, TU Twente.

Schmidt, Frank L., & Hunter, John E. (1998). The Validity and Utility of Selection Methods in Personnel Psychology. Practical and Theoretical Implications of 85 Years of Research Findings. Psychological Bulletin, 124, 262-274.

Tett, R. P., & Burnett, D. D. (2003). A personality trait-based interactionist model of job performance. Journal of Applied Psychology, 88, 500–517.


Questions for discussion

-        How would you explain the “psychometric perspective” of personality?

-        What are the implications of restricting the domain of personality data to publicly observable behaviours?

-        What are the main arguments why the Five Factor Model has a global intercultural applicability, and which should be the limiting conditions in this respect?

-        What are reasons to expect that smaller and more homogeneous behavioural facets of the main big five factors of the FFM are better predictors of real life criteria than those big five factors themselves?

-        What are the conceptual differences between behaviours, intentions and motivations in relation to the concept of personality as defined in this reading? What are the consequences for research methodology?

-        In what way may well-constructed self appraisal personality questionnaires be perceived as good approximations of the average personality appraisal by well-informed others?

-        Which methods may be used to counter dishonest responding in self appraisal personality questionnaires?

-        What important contributions to personality research can be made by modern test theory in the form of Item Response Theory (IRT) in contrast to deficiencies in Classical Test Theory (CTT)?

-        How would you describe the two “bridges” to be crossed in the chain of inference from personality assessment to a real life criterion?

-        In what way is the predictive power of personality influenced by differences in mental ability?

About the author: Nico Smid

Nico Smid has been a principal consultant within PiCompany since 1999. Before that time he has had a broad and varied experience in both scientific research and teaching as well as HR consulting. As a university lecturer he taught research methodology as well as personality theory and assessment, while at the same time fulfilling central university management roles.

He became a PhD in Psychology with a dissertation on “Determinants of Personality Judgements”. He was a co-founder and board member of the European Association of Personality Psychology. From 1986 onwards he was a central concept development consultant for management development within Philips Electronics and he has among other things redesigned and implemented selection and potential appraisal systems within that company. Following that he has been a HR strategy and assessment centre development consultant within two consultancy firms, Beteor and Towers Perrin. At present within PiCompany he is responsible for quality management and concept development as well as maintaining external professional networks.