Unit 5 : Types of Tests Items
An objective test requires a respondent to provide a briefly response which is usually not more than a sentence long. The tests normally consist of a large number of items and the responses are scored objectively, to the extent that competent observers can agree on how responses should be scored.
There are two major types of objective tests. These are the selection type and the supply type. The selection type consists of the multiple-choice type, true and false type and matching type. The supply type has variations as completion, fill-in-the-blanks and short-answer.
- Requires students to choose among several designated alternatives or write a short answer.
- Consists of many items requiring only brief answers.
- A lot of time is spent by students in reading and thinking when taking the test.
- Quality of test is determined largely by the skill of the test constructor.
- Relatively tedious and difficult to prepare but rather easy to score.
- Permits and encourages guessing.
- Afford only the test constructor (teacher) the opportunity to be individualistic.
- Score distribution is determined largely by the test.
- Amenable to item and statistical analysis.
- Scoring is highly objective.
- Content validity is high.
- Reliability of test scores could be high.
Strengths and advantages
- Scoring is easy and objective
- They allow an extensive coverage of subject content.
- They do not provide opportunities for bluffing.
- They are best suited for measuring lower-level behaviours like knowledge and comprehension.
- They provide economy of time in scoring
- Student writing is minimized. Premium is not placed on writing.
- They are amenable to item and statistical analysis
- Scores are not affected by extraneous factors such as the likes and dislikes of the scorer.
Weaknesses and disadvantages
- They are relatively difficult to construct.
- Item writing is time consuming.
- They are susceptible to guessing.
- Higher-order mental processes like analysis, synthesis and evaluation are difficult to measure.
A multiple-choice test is a type of objective test in which the respondent is given a stem and then is to select from among three or more alternatives (options or responses) the one that best completes the stem. The incorrect options are called foils or distracters.
There are two types of multiple-choice tests. These are the single ‘best response’ type and the ‘multiple response type’. The single ‘best response’ type consists of a stem followed by three or more responses and the respondent is to select only one option to complete the stem.
An example is
Write 0.039387 as a decimal correct to 3 significant figures.
The multiple response type consists of a stem followed by several true or false statements or words. The respondent is to select, which statement(s) could complete the stem
An example is:
Which of the following action(s) contribute to general principles of First Aid?
- Arrest haemorrhage
- Bath the patient
III. Immobilize injured bone
- I only
- 11 only
- I and II
- I and III
- 1, II and III
Guidelines for constructing multiple-choice tests
- The central issue of the item should be in the stem. It should be concise, easy to read and understand.
The following are examples of poor and good items
Ghana The largest man-made lake in Africa is in
- became independent in 1960 A. Chad
- has West Africa’s largest population B. Ghana
C has the largest man-made lake in Africa C. Kenya
- is the world’s leading cocoa producer D. Tanzania
E. is a landlocked country E. Uganda
- The options should be plausible. Distracters must be plausibly attracted to the uninformed.
The longest river in Africa is The longest river in Africa is
- Benue A. Niger
- Densu B. Nile
- Nile C. Volta
- Pra D. Zaire
- Thames E. Zambesi
In the poor example, rivers Benue, Densu and Pra are not significantly long to attract respondents and Thames is not in Africa.
- All options for a given item should be homogeneous in content, form and grammatical structure.
The first woman Prime Minister The first woman Prime Minister
in the world is/was in the world is/was
- Margaret Thatcher A. Corazon Acquino
- Margaret Peil B. Golda Meir
- Mother Teresa C. Indira Gandhi
- Sirimavo Bandaranaike D. Margaret Thatcher
- Valentina Terescova E. Sirimavo Bandaranaike
In the poor example, only options A and D were Prime Ministers.
- Repetition of words in the options should be avoided.
‘Which is the best definition of a contour-line?
- A line on a map joining places of equal barometric pressure.
- A line on a map joining places of equal earthquake intensity.
- A line on a map joining places of equal height.
- A line on a map joining-places of equal mean temperature.
- A line on a map joining places of equal rainfall.
A line on a map joining places of equal pressure is called
- Specific determiners which are clues to the best/correct option should be avoided.
The first woman cosmonaut was a The first woman to go into space was a/an
- American A. American
- Englishman B. British
- Irish C. French
- Italian D. Italian
- Russian E. Russian
In the poor example, the article, a, gives a clue that the correct option is Russian. In addition, it is only Russians who use the term, cosmonaut.
- Vary the placement of the correct options. No discernible pattern of the correct/best responses should be noticed.
- Items measuring opinions should not be included. One option should clearly be correct or the best.
The best Ghanaian medical doctor is The Ghanaian medical doctor famous
for his work on the sickle-cell disease is
- Charlotte Gardiner A. F. I. D. Konotey-Ahulu
- F. I. D. Konotey-Ahulu B. F. O. Acheampong
- Mary Grant C. K. G. Korsah
- Mohamed Mustafa D. K. K. Korsah
- Moses Adibo E. M. K. Mustafa
- The responses in agreement must be parallel in form, i.e. sentences must be about the same length.
In constructing multiple-choice test items, options to an item should be
- arranged in horizontally.
- copied directly from class notes or textbooks.
- in a discernible pattern of responses for easy identification.
- homogeneous in content.
In constructing multiple-choice test items, options to an item should be
- arranged horizontally
- copied from textbooks.
- heterogeneous in content.
- homogeneous in content.
- The responses in agreement must be in an alphabetical/sequential order.
Example: In constructing multiple-choice test items, options to an item should be
- arranged horizontally.
- copied from textbooks.
- heterogeneous in content.
- homogeneous in content.
- The responses in agreement must be itemized vertically and not horizontally.
Example: A teacher training college principal measures many variables in his college. An example of a variable measured in an ordinal scale is
- enrollment in each class. income in cedis of the teachers. C. years of service for each teacher. D. professional qualification of the teachers.
A teacher training college principal measures many variables in his college. An example of a variable measured in an ordinal scale is the
- enrollment in each class.
- income in cedis of the teachers.
- professional qualification of the teachers.
- years of service for each teacher.
- Each option must be distinct. Overlapping alternatives should be avoided.
An adult liver weighs about An adult liver weighs exactly
- 6 lbs A. 7 lbs
- 5 lbs B. 6 lbs
- 4 lbs C. 5 lbs
- 3 lbs D. 4 lbs
- 2 lbs E. 3 lbs
- Avoid using “all of the above” as an option but “None of the above” can be used sparingly. It should be used only when an item is of the ‘correct answer’ type and not the ‘best answer’ type.
The following are local signs and symptoms In administering intramuscular
of inflammation except injection, the needle is inserted into
the muscle at an angle of
- rashes A. 300
- redness B. 450
- restoration of function C. 600
- sleeplessness D. 900
E. None of the above E. None of the above
In the poor example, there are other signs and symptoms not included whereas in the good example there is one and only one answer.
- Stems and options should be stated positively. However, a negative stem could be used sparingly and the word not should be emphasized either by underlining it or writing it in capital form. An example is:
Which of these insects has NOT been incriminated to transmit diseases?
- Body louse
- Sentences should not be copied from textbooks, or from past test items. Original items should be made.
- Create independent items. The answer to one item should not depend on the knowledge of the answer to a previous item. For example:
Item 1. The perimeter of a rectangular field is 60 metres. If one side is 20 metres long,
what is the width of the field?
- 10 metres
- 20 metres
- 30 metres
- 40 metres
- 60 metres
Item 2. Find the length of the diagonal of the rectangular field in item 1 above.
- 10.0 metres
- 20.0 metres
- 22.4 metres
- 30.6 metres
- 40.0 metres
True and False tests
A true and false test consists of a statement to marked true or false. A respondent is expected to demonstrate his command of the material by indicating whether the
given statement is true or false.
Sir Gordon Guggisberg was the governor who built the Takoradi Harbour. True or False
Guidelines for constructing true and false tests
- Statements must be definitely true or definitely false.
Poor: The value of 2/3 as a decimal fraction is 0.7. True or False
Good: The value of i expressed as a decimal fraction correct to two decimal places is 0.66. True or False
- Avoid words that tend to be clues to the correct answer.
Words like some, most, often, many, may are usually associated with true statements. All, always, never, none are associated with false statements. These words must therefore be avoided.
- Approximately, half (50%) of the total number of items should be false because it is easier to construct statements that are true and the tendency is to have more
- Statements must be original. They must not be copied directly from textbooks, past test items or any other written material.
- Statements should be worded such that superficial logic suggests a wrong answer.
Poor: A patient took one tablet of a prescribed medicine and was healed in 24 hours. 8 tablets would therefore heal him in 3 hours. True or False
The true case is that 8 tablets would constitute an overdose.
- Statements should possess only one central theme.
Poor: Akropong Teacher Training College, built in 1900, is the first teacher training institution in Ghana
Two main themes are in the statement; When it was built, and Whether it was the first
- State each item positively. Negative item could however be used with the negative word, ‘not’, emphasized by underlining or writing in capital letters. Double negatives should be avoided.
- Statements should be short, simple and clear. Ambiguous as well as tricky statements should be avoided.
Examples: (1) Abedi Pele was the best Ghanaian footballer. True or False
- (2) Margaret Thacher was the British Prime Minister in 1989.
True or False
Item 1 is ambiguous because best is relative while the trick in item 2 is the spelling of Thatcher.
- Statements should measure important ideas not trivia.
Poor: Dr. Kwame Nkrumah, had artificial teeth. True or False.
Good: Dr. Kwame Nkrumah was the first President of Ghana. True or False.
- Arrange the items such that the correct responses do not form a discernible pattern like TTTT FFFF TTTT FFFF.
- To avoid scoring problems, let students write the correct options in full.
- Double-barrelled statements should be avoided. These statements have one part true and one part false.
Poor: The Bond of 1844, signed by Governor Commander Hill declared the Northern territories of Ghana a Protectorate.
The Bond was signed by Commander Hill but did not achieve the stated purpose.
The matching type of objective test consists of two columns. The respondent is expected to associate an item in Column A with a choice in Column B on the basis of a well-defined relationship.
Column A contains the premises and Column B the responses or options.
Match the vitamins in Column A with the diseases and conditions which a lack of the vitamin in causes in column B
Column A: Column B
Vitamins Diseases caused by lack
- Vitamin A a. Beriberi
- Vitamin C b. Kwashiorkor
- Vitamin D c. Pellagra
- Poor eyesight
Guidelines for constructing matching-type tests
- Do not use perfect matching. Have more responses than premises. There should be at
least three more responses than premises.
- Arrange premises and responses alphabetically or sequentially. This reduces the amount of unnecessary searching on the part of the person who knows the answer.
- Column A (premises) should contain the list of longer phrases. The shorter items should constitute the responses.
- Limit the number of items in each set. For each set, the number of premises should not be more than six per set with the responses not more than ten.
- Use homogeneous options and items.
Instruction: Select an option from List B to match list A.
- The Battle of Dodowa a. 1824
- Built Korle Bu hospital b. Gordon Guggisberg
- Longest river in Africa c. Nile
- Lord Listowel
Instruction: Select a river from list B to complete the description in list A. Write the answer against the number in list A.
List A List B
Description of river Name of river
- Aswan Dam is built on it a. Niger
- Longest river in West Africa. b. Nile
- Is a tributary of River Zaire. c. Orange
- Provide complete directions. Instructions should clearly show what the rules are and also how to respond to the items.
- State clearly what each column represents
- Avoid clues (specific determiners) which indirectly reveal the correct option
- All options-must be placed (and typed) on the same page.
Constructed-Response Type and
Short-Answer Type tests
This type of objective test is also known as the Supply, Completion, and fill-in-the blanks. It consists of a statement or question and the respondent is required to complete it with a short answer usually not more than one line.
Examples: 1. Modern nursing was introduced into Ghana in the year _________
- Who was the first Ghanaian Prime Minister? ________________
- The environment has three component parts: Name them.
Guidelines for constructing short-answer tests
- Keep the number of missing words or blank spaces low. Preferably use one blank per item. There should not be more than two blanks in one item.
Poor: The_____of _____ took place in _____________
Good: The battle of Dodowa took place in the ________
- Use original statements that are carefully constructed. Statements should not be
lifted from textbooks or past items or any written material.
- Avoid specific determines which provide clues to the correct option.
- Blanks must be placed at the end or near the end of the statement and not at the
Poor: ____________ is an instrument used for measuring temperature.
Good: An instrument used for measuring temperature is called _________________
- Items should be so clearly written that the type of response required is clearly
Poor. The battle of Nsamankow was fought in ______________
Good: The battle of Nsamankow was fought in the year _______
- Avoid lengthy and tortous statements
Poor: A specific disease in which acute glomerular damage occurs following distant infections, particularly with certain streptococci and usually affects children and young adults and which clinical picture is commonly one of a dramatic onset of oedema and haomaturia is _______________________
Good: The disease in which acute glomerular, damage occurs following distant
infections is ___________________
- Think of the intended answer first before constructing the item.
- Missing words must be important ones. Avoid omitting trivial words to trick the
student. Only test for important facts and knowledge.
Poor: The ___ of the June 4 ,1979 revolution in Ghana was Flt. Lt. J. J. Rawlings.
- Specify the degree of precision and the units of expression required in computational problems.
Poor: The value of 2.6 ¸ 0.07 is ____________
Good: The value of 2.6 ¸ 0.07 correct to 3 decimal places is ____________
- Aim at providing items that belong to the correct answer type and not the best answer type.
Poor: The best audio-visual material to use in the classroom is _____________.
Good: Radios and tape recorders are regarded as ______________ audio-visual aids.
- Keep all blanks the same length, and in a column to the right of the question.
A direct question is generally more desirable than an incomplete statement.
There are two types of essay-tests. These are the restricted response type and the extended response type.
The restricted response type limits the respondent to a specified length of the response. For example, ‘In not more than 200 words explain the causes of the Yaa Asantewa War of 1900.
The extended response type does not limit the student in the form and scope of the answer. For example, Discuss the factors that led to the overthrow of the Dr. Kwame Nkrumah’s government in Ghana in 1966.
- Requires students to plan their own answers and to express them in their own words.
- Consists of relatively few items that call for extended answers.
- A lot of time is spent by students in thinking and writing when taking the test.
- Quality of test is determined largely by the skill of the test scorer.
- Relatively easy to prepare but rather tedious and difficult to score.
- Permits and encourages bluffing.
- Afford both the student and teacher the opportunity to be individualistic.
- Score distribution varies from one scorer to another.
- Less amenable to item and statistical analysis.
- Scoring is subjective.
11 Content validity is low.
- Reliability of test scores is low.
Strengths and Advantages
- They provide the respondent with freedom to organize his own ideas and respond within unrestricted limits.
- They are easy to prepare.
- They eliminate guessing on the part of the respondents.
- Skills such as the ability to organize material and ability to write and arrive at conclusions are improved.
- They encourage good study habits as respondents learn materials in wholes.
- They are best suited for testing higher-order behaviours and mental processes
such as analysis, synthesis and evaluation
- Little time is required to write the test Items.
- They are practical for testing a small number of students.
Weaknesses and Disadvantages
- They are difficult to score objectively. Starch and Elliott (1912, 1913) reported
that inter-rater variability could be as high as 68.
- They provide opportunities for bluffing where students write irrelevant and
- Limited aspects of student’s knowledge are measured as students respond to few items only.
- The items are an inadequate sample of subject content. Several content areas are omitted.
- A premium is placed on writing. Students who write faster, all things being equal are expected to score higher marks.
- They are time-consuming to both the teacher who scores the responses and the student who writes the responses.
- They are susceptible to the halo effect where the scoring is influenced by extraneous factors such as the relationship between scorer and respondent.
- A critical reader as well as a competent scorer can only effectively score responses.
Guidelines in constructing good classroom essay tests
- Plan the test.
Give adequate time and thought to the preparation of the test items. The test
items must be constructed from a test specification table and well in advance (at
least two weeks) of the testing date.
- The items should be based on novel situations and problems. Be original. Do not copy directly from textbooks or past test items.
- Test items should require the students to show adequate command of essential
knowledge. The items should not measure rote memorization of facts,
definitions and theorems but must be restricted to the measuring of higher
mental processes such as application, analysis, synthesis and evaluation.
Examples of items include:
You are in charge of a youth camp of 100 campers. Prepare a menu chart
which shows a balanced diet taking into consideration cost and nutritional
Here the student uses knowledge learnt in school to deal with a concrete situation.
A student girl was severely and unfairly punished. Describe some of the
feelings such treatment aroused in her.
You are the financial secretary of a society aimed at raising money to build a
Post Office in your community. Plan and describe a promotional campaign for
raising the money.
Evaluate the function of the United Nations Organization as a promoter of
- The length of the response and the difficulty level of items should be adapted to
the maturity level of students (age and educational level).
An item like:
“Discuss the implications of the Lome II Convention on the economy of Ghana” would be too difficult for a first year senior secondary school student.
- Optional items should not be provided when content is relevant. They may be necessary
only for large external examinations and when the purpose of the test is to measure writing effectiveness. If students answer different questions, an analysis of the performances on the test items is difficult.
- Prepare a scoring key (marking scheme) at the time the item is prepared.
Decide in advance what factors will be considered in evaluating an essay response. Determine the points to he included and the weights to be assigned for each point. The preparation of a model answer will help disclose ambiguities in an item.
- Establish a framework and specify the limits of the problem so that the student
knows exactly what to do.
The following item for example does not establish any framework for the student to operate in.
Write brief notes on the following:
- United Nations Organization (UNO)
- African Union (AU)
- European Union (EU)
- Present the student with a problem which is carefully worded so that only ONE
interpretation is possible. The questions/items must not be ambiguous or vague.
For example: Family Planning in Ghana is a “mixed bag”. Discuss.
Different interpretations could be given to the term ‘mixed bag’ if it was not mentioned in class.
- Indicate the value of the question and the time to be spent in answering it.
- Structure the test item such that it will elicit the type of behaviour you really want to measure.
- The test items must be based on the instructional objectives for each content unit.
An item like:
Discuss the factors which in your opinion, contributed to the escalation of the Persian Gulf War in 1990.
This item elicits students opinions which might be different from the behaviour desired.
- Give preference to a large number of items that require brief answers. These
provide a broader sampling of subject content and thus better than a few items that require extended responses.
- Start essay test items with words that are clear and as simple as possible and which
requires the student to respond to the stimulus expected. Avoid words such as: what, list who, as much as possible.
For example: What can you as a teacher do to promote professionalism in the teaching service in Ghana? This item requires only a statement as the response and not an extended answer.
Commonly used words to begin essay test items
- Analyze: To determine elements or essential features; examine in detail to identify causes, key factors, possible results.
- Assess: To estimate or judge the value, character etc of.
- Describe: To tell or depict in written words.
- Discuss: To consider or examine by argument or comment.
- Give an account of: A narrative or written description of particular events or situations, a statement of reasons, causes, causes etc explaining some event.
- Evaluate: To judge or determine the significance, worth or quality of
- Examine: To inspect or scrutinize carefully; to inquire into or investigate.
- Explain: To make plain or clear; to make known in detail.
Scoring essay tests
Essay tests can be scored by using the analytic scoring rubrics (also known as the point-score method) or holistic scoring rubrics (also called global-quality scaling or rating method).
In analytic scoring, the main elements of the ideal answer are identified and points awarded to each element. This works best on restricted response essays.
In holistic scoring, the model answer serves as a standard. Each response is read for a general impression of its adequacy as compared to the standard. The general impression is then transformed into a numerical score. To check the consistency of the scoring, a first reading is done to sort the responses into several piles (mostly five A, B, C, D, E,) according to the different levels of quality. A second reading of each pile enables the actual grade or score to be given.
Principles for scoring essay tests
- Prepare a form of scoring guide, either an analytic scoring rubric or a holistic scoring rubric.
- Tests must be kept as anonymous as possible. This reduces the halo effect. Different forms of identification could be used instead of names.
- Grade the responses item by item and not script by script. Score all responses to each item before going to the next item. This reduces the carryover effect. The carryover effect occurs when the mark for a question is influenced by the performance on the previous question.
- Keep scores of previously graded items out of sight when evaluating the rest of the items.
- Periodically rescore previously scored papers.
- Before starting to score each set of items the script should be shuffled.
- Score the essay test when you are physically sound, mentally alert and in an
environment with very little or no distraction.
- Constantly follow the scoring guide as you score. This reduces the rater drift which is the tendency to either not paying attention to the scoring guide over time or interpreting it differently as time passes.
- Avoid being influenced by the first few papers read. These could make you either too harsh or too lenient.
- Score a particular question an all papers at one sitting. Break when fatigue sets in.
- Arrange for an independent scoring of the responses or at least a sample of them where grading decision is crucial.
- Comments could be provided and errors corrected on the scripts for class tests to facilitate learning.
- The mechanics of writing such as correct grammar usage, paragraphing, flow of
expression, quality of handwriting, orderly presentation of material and spelling
should be judged separately from the content.