Gender Differential Item Functioning (GDIF) Analysis in  Iran's University Entrance Exam

Soodeh Bordbar

doi:10.24853/elif.3.1.49-68

Authors

Soodeh Bordbar Iran University of Medical Science

DOI:

https://doi.org/10.24853/elif.3.1.49-68

Keywords:

Gender Differential Item Functioning analysis (GDIF), Bias, Dimensionality, Fairness, Rasch Model

Abstract

The significant aspect of validity defines what the test score actually and potentially represents, especially to the causes of invalidity concepts of fairness, bias, injustice, and inequity. The Differential Item Functioning (DIF) examines the test items to define test fairness and to examine the validity of educational tests. If gender plays a major role in the testing items, this will lead to bias. This research examines the validity of a test for high-stakes and discusses gender's role as a bias in different linguistic tests, to explore validity and DIF analytics. To get a DIF analysis, the Rasch model had been used as a university entry requirement for English language studies for five thousand people taking part, who'd been randomly selected from a group of examiners participating in the National University Entrance Exam for Foreign Languages (NUEEFL), i.e., English literature, Teaching, and Translation. The test results indicated that the test scores are not free of construct-irrelevant variance, and certain inaccurate items have been modified following the fit statistics guidelines. Overall, NUEEFL's fairness was not clarified. These findings had been some advantage to test designers, stakeholders, administrators, and teachers through that kind of psychometric test. Then it suggested the future administering criteria and bias-free tests and teaching materials.

Author Biography

Soodeh Bordbar, Iran University of Medical Science

English Department

References

Alavi, , Ali, , & Amirian, . (2011). Academic Discipline DIF in an English Language Proficiency Test. Journal of English Language Teaching and Learning, 7(5), 39–66. http://noo.rs/KIrXf

Aryadoust, V., Goh, C. C. M. & Kim, L. O. (2011). An investigation of differential item functioning in the MELAB listening test. Language Assessment Quarterly, 8(4), 361–385.

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University Press.

Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing . Journal of the Royal Statistical Society, 57(1), 289–300. https://doi.org/10.2307/2346101

Boone, W. J., Yale, M. S., & Staver, J. R. (2014). Rasch Analysis in the Human Sciences. Springer Science, and Business Media. https://doi.org/10.1007/978-94-007-6857-4

Boyle, J. (1987). Sex differences in listening vocabulary. Language Learning, 37(2), 273-284.

Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational Measurement (4th ed., Vol. 4, pp. 221-256). Westport, CT: American Council on Education & Praeger.

Camilli, G., & Penfield, D. A. (1997). Variance estimation for differential test functioning based on the Mantel-Haenszel log-odds ratio. Journal of Educational Measurement, 34, 123–139.

Carlton, S. T., & Harris, A. M. (1992). Characteristics associated with differential item functioning on the Scholastic Aptitude Test: Gender and majority/minority group comparisons. Princeton, NJ: Educational Testing Service.

Cohen, L. (1979). Approximate expressions for parameter estimates in the Rasch model. The British Journal of Mathematical and Statistical Psychology, 32, 113-120.

Cole, N. S. (1997). The ETS gender study: How females and males perform in educational settings. Princeton, NJ: Educational Testing Service.

Dunne, D. W. (2015). Cautions Issued About High-Stakes Tests | Education World. Education World. https://www.educationworld.com/a_issues/issues110.shtml

Furr, M. R., & Bacharach, V. R. (2007). Psychometrics: An Introduction. Thousand Oaks, CA: SAGE.

Holland, P. W., & Wainer, H. E. (2012). Differential item functioning. London, UK: Routledge.

Kane, M. T. (2013). Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000

Karami, H. (2010). A differential item functioning analysis of a language proficiency test: an investigation of background knowledge bias. Unpublished MA Thesis, University of Tehran.

Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2), 27-38.

Karami, H. (2015). A closer look at the validity of the University Entrance Exam: Dimensionality and generalizability. (Unpublished Ph.D dissertation, University of Tehran).

Kunnan, A. J. (2010). Test fairness and Toulmin's argument structure. Language Testing, 27(2), 183–189.

Ledesma, R. D., Valero-Mora, P., & Macbeth, G. (2015). The Scree Test and the Number of Factors: a Dynamic Graphics Approach. The Spanish Journal of Psychology, 18, E11. https://doi.org/10.1017/sjp.2015.13

Li, H., & Suen, H. (2013). Detecting native language group differences at the subskills level of reading: A differential skill functioning approach. Language Testing. 30, 273-298. https://doi.org/10.1177/0265532212459031.

Lin, J., & Wu, F. (2003). Differential performance by gender in foreign language testing. Paper presented at the annual meeting of the national council on measurement in education (Chicago, IL.).

Linacre, J. M. (1991-2006). A user’s guide to Winsteps® Ministep Rasch-model computer programs. Retrieved January, 10, 2007, from http://www.winsteps.com/aftp/winsteps.pdf

Linacre, J. M. (2006). Data variance explained by measures. Rasch Measurement Transactions, 20, 1045–1047.

Linacre, J. M. (2012). A user’s guide to Winsteps [User’s manual and software]. Retrieved from http://www.winsteps.com/winsteps.htm.

Linacre, J. M. (2016a). Winsteps® Rasch measurement computer program User's Guide. Beaverton, Oregon: Retrieved from http://www.winsteps.com/

Linacre, J. M. (2016b). Winsteps® (Version 3.92.1) [Computer Software]. Beaverton, OR: Winsteps.com. Retrieved from http://www.winsteps.com/

Messick, S. J. (Ed.). (2013). Assessment in higher education: Issues of access, quality, student development, and public policy. Routledge, Taylor and Francis Group.

Mirzaei, A., Hashemian, M., & Tanbakooei, N. (2012). Do Different Stakeholders’ Actions Transform or Perpetuate Deleterious High-Stakes Testing Impacts in Iran?. . The 1st Conference on Language Learning & Teaching: An Interdisciplinary Approach (LLT –IA). https://www.sid.ir/en/Seminar/ViewPaper.aspx?ID=24946

Mohammad, S., Amirian, R., Alavi, S. M., & Fidalgo, A. M. (2014). Detecting Gender DIF with an English Proficiency Test in EFL Context. Iranian Journal of Language Testing, 4(1), 187–203.

Pae, T. (2004). Gender effect on reading comprehension with Korean EFL learners. System, 32(2), 265–281.

Pae, H. (2011). Differential item functioning and unidimensionality in the Pearson Test of English Academic. Pearson Education Ltd.

Ramsey, P. A. (1993). Sensitivity review: The ETS experience as a case study. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 367-388). Hillsdale, NJ: Erlbaum.

Raîche, G., Walls, T. A., Magis, D., Riopel, M., & Blais, J.-G. (2012). Non-Graphical Solutions for Cattell’s Scree Test. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(1), 23–29. https://doi.org/10.1027/1614-2241/a000051

Rasch Measurement Forum. (2017). Retrieved from http://raschforum.boards.net/.

Rezaee, A. A., & Shabani, E. (2010). Gender differential item functioning analysis of the University of Tehran English Proficiency Test. Pazhuhesh-e Zabanha-ye Khareji. 56. 89-108.

Rezai-Rashti, G., & Moghadam, V. (2011). Women and higher education in Iran: What are the implications for employment and the ‘‘marriage market’’? International Review of Education, 57, 419–441.

Roever, C., & McNamara, T. (2006). Language Testing: The Social Dimension. International Journal of Applied Linguistics, 16(2). https://doi.org/10.1111/j.1473-4192.2006.00117.x

Ryan, K., & Bachman, L. (1992). Differential item functioning on two tests of EFL proficiency. Language testing, 9(1), 12-29.

Sadeghi, S. (2014). High-stake Test Preparation Courses: Washback in Accountability Contexts. Journal of Education & Human Developmrnt, 3(1), 17–26.

Salehi, M. & Tayebi, A. (2012). Differential item functioning in terms of gender in reading comprehension subtest of a high-stakes test. Iranian Journal of Applied Language Studies. 4(1). 135- 168.

Salehi, H. & Yunus, M.M., (2012a). The Washback Effect of the Iranian Universities Entrance Exam: Teachers’ Insights. GEMA Online™ Journal of Language Studies. 12(2),. 609- 628.

Salehi, H. & Yunus, M.M., (2012b). University Entrance Exam in Iran: A bridge or a dam. Journal of Applied Sciences Research, 8(2): 1005-1008, 2012. ISSN 1819-544X

Scheuneman, J. D., & Bleistein, C. A. (1989). A Consumer’s Guide to Statistics for Identifying Differential Item Functioning. Applied Measurement in Education, 2(3), 255–275. https://doi.org/10.1207/s15324818ame0203_6

Song, X., & He, L. (2015). The Effect of a National Education Policy on Language Test Performance: A Fairness Perspective. Language Testing in Asia, 5(1), 1–14. https://doi.org/10.1186/s40468-014-0011-z

Spolsky, B., & Bachman, L. F. (1991). Fundamental Considerations in Language Testing. The Modern Language Journal, 75(4). https://doi.org/10.2307/329499

Tae, P. (2004). Gender effect on reading comprehension with Korean EFL learners. System, 32, 265-281.

Tahmasbi, S., & Yamini, M.. (2012). Teachers’ Interpretations and Power in a High-Stakes Test: A CLA Perspective. English Linguistics Research, 1(2), 53. https://doi.org/10.5430/elr.v1n2p53.

Terry, R. M., Genesee, F., & Upshur, J. A. (1998). Class-Room-Based Evaluation in Second Language Education. The Modern Language Journal, 82(1). https://doi.org/10.2307/328719

The Glossary of Education Reform. (2014). 11 Ways to Improve School Communications and Community Engagement. https://www.edglossary.org/school-communications/

Wiberg, M. (2007). Measuring and Detecting Differential Item Functioning in Criterion-Referenced Licensing Test : A Theoretic Comparison of Methods. In Educational Measurement, technical report N. 2.

Xi, X. (2010) How do we go about investigating test fairness? Language Testing, 27(2), 147-170.

Gender Differential Item Functioning (GDIF) Analysis in Iran's University Entrance Exam

Authors

DOI:

Keywords:

Abstract

Author Biography

Soodeh Bordbar, Iran University of Medical Science

References

Downloads

Published

Issue

Section

License

About the Journal

Editorial Board

For Authors

Issues

Quick Actions

Visitor Analytics