นานาวิธีในการหาความตรงเชิงเนื้อหาของเครื่องมือวิจัยเชิงปริมาณและการเลือกใช้วิธีที่เหมาะสม
DOI:
https://doi.org/10.58837/CHULA.PPJ.39.5คำสำคัญ:
ความตรงเชิงเนื้อหา, การเลือกความตรงเชิงเนื้อหา, ประเภทของความตรงเชิงเนื้อหา, ความตรงเชิงเนื้อหาชนิดใหม่บทคัดย่อ
ความตรงเชิงเนื้อหาเป็นคุณลักษณะที่สำคัญที่สุดของบททดสอบและเครื่องมือวิจัยประเภทต่างๆ (PTI, 2006) เป็นดัชนีที่นักวิจัยได้รับการคาดหวังให้รายงานอย่างชัดเจนในรายงานการวิจัยหรือบทความของตน มิฉะนั้นงานอาจดูน่าสงสัยและไม่น่าไว้วางใจ มีหลายวิธีในการคำนวณหาดัชนีดังกล่าว ขึ้นอยู่กับลักษณะทั่วไปของข้อมูล ประเภทของมาตรวัด และจำนวนผู้เชี่ยวชาญในสาขา (ผู้ประเมิน) บทความนี้นำเสนอ 15 วิธีในการหาดัชนีประเภทต่างๆ ที่กล่าวถึง รวมถึงข้อดีและข้อเสีย และเสนอแนะแนวทางการใช้วิธีที่เหมาะสมให้นักวิจัยนำไปใช้ในการศึกษาวิจัย
References
จักรกฤษณ์ สำราญใจ. (2554). IOC = ความตรง?. วารสารหลักสูตรและการเรียนการสอน. 4(1–2), มหาวิทยาลัยขอนแก่น. https://www.scribd.com/doc/86608731/IOC.
พิศิษฐ ตัณฑวณิช และพนา จินดาศรี. (2561). ความหมายที่แท้จริงของค่า IOC. วารสารการวัดผลการศึกษา มหาวิทยาลัยมหาสารคาม, 24(2), 3–12. https://so02.tci-thaijo.org/index.php/jemmsu/article/view/174521/124950
ล้วน สายยศ และอังคณา สายยศ. (2539). หลักการสร้างแบบทดสอบความถนัดทางการเรียน. วัฒนาพานิช.
เยาวดี รางชัยกุล วิบูลย์ศรี. (2556). การวัดผลและการสร้างแบบสอบผลสัมฤทธิ์. สำนักพิมพ์แห่งจุฬาลงกรณ์มหาวิทยาลัย.
Abbott, R. D. & Perkin, D. (1982). Reliability and validity evidence for scale measuring dimensions of student ratings of instruction, Journal of Educational and Psychological Measurement, 42(2), 563–569. https://doi.org/10.1177/001316448204200220
Andres, A. M. & Marzo, P. F. (2004). Delta: A new measure of agreement between two raters. British Journal of Mathematical and Statistical Psychology, 57, 1–19. https://doi.org/10.1348/000711004849268
Ato, M., López, J. J., & Benavente, A. M. (2011). A simulation study of rater agreement measures with 2x2 contingency tables. Psicológica, 32(2), 385–402. https://www.uv.es/psicologica/articulos2.11/12ATO.pdf
Ayre, C. & Scally, A. J. (2014). Critical values for Lawshe’s content validity ratio: revisiting the original methods of calculation. Journal of Measurement and Evaluation in Counseling and Development, 47(1), 79–86. https://doi.org/10.1177/0748175613513808
Cherry, K. (2017). Why Validity is Important to Psychological Tests. Verywellmind. Retrieved from https://www.verywellmind.com/what-is-validity-2795788
Cherry, K. (2023). Validity in psychological tests: why measures like validity and reliability are important. https://www.verywellmind.com/what-is-validity-2795788
Choudhury, A. (2018). Top 4 characteristics of a good test. http://www.yourarticlelibrary.com/education/test/top-4-characteristics-of-a-good-test/64804
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104
Cronbach, L. J. & Meehl, P. E. (1955). Construct validity in psychological tests. Journal of Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957
Disha, M. (2018). Validity of a test: 6 types | statistics. http://www.yourarticlelibrary.com/statistics-2/validity-of-a-test-6-types-statistics/92597
Ebel, R. L. (1972). Essential educational measurements. Prentice Hall.
Fitzpatrick, A. R. (1983). The Meaning of content validity. Applied Psychological Measurement, 7(1), 3–13. https://doi.org/10.1177/014662168300700102
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378-382. https://doi.org/10.1037/h0031619
Garrett, H. E. (1964). Testing for teachers. American Book Company.
Girard, J. M. (2022). Scott's pi coefficient. https://github.com/jmgirard/mReliability/wiki/Scott%27s-pi-coefficient.
Goodwin, L. D. (2001). Interrater Agreement and Reliability. Measurement in Physical Education and Exercise Science, 5(1), 13–34. https://doi.org/10.1207/S15327841MPEE0501_2
Gwet, K. (2002). Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Statistical Methods for Inter-Rater Reliability Assessment Series, 2, 1–9. http://www.agreestat.com/research_papers/inter_rater_reliability_dependency.pdf
Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1). https://doi: 10.1348/000711006X126600. PMID: 18482474
Haley, D. T., Thomas, P., Petre, M., & Roeck, A. D. (2008). Using a new inter-rater reliability statistic. Technical Report No. 2008/15. https://pdfs.semanticscholar.org/765d/f9d90295d5ca2b59e5092c4a5f7a09668d23.pdf
Haynes. S. N., Richard, D. C. S. & Kubany, E. S. (1995). Content validity in psychological assessment: A Functional approach to concepts and methods. Psychological Assessment, 7(3), 238–247. https://doi.org/10.1037/1040-3590.7.3.238
Hughes, A. (1995). Testing for Language Teachers. Bell & Bain, Ltd.
Kleeman, J. (2018). Six tips to increase content validity in competence tests and exams. https://www.questionmark.com/resources/blog/six-tips-to-increase-reliability-in-competence-tests-and-exams/
Krippendorff, K. (2011). Computing Krippendorff 's Alpha-Reliability. https://www.statisticshowto.datasciencecentral.com/wp-content/uploads/2016/07/fulltext.pdf
Lado, R. (1975). Language testing. Wing Tasi Cheung Printing.
Laerd Statistics. (2019). Fleiss' kappa using SPSS Statistics. Statistical tutorials and software guides. https://statistics.laerd.com/spss-tutorials/fleiss-kappa-in-spss-statistics.php
Laerd Research. (2018). Content validity. http://dissertation.laerd.com/content-validity.php
Lawshe, C. H. (1975). A Quantitative approach to content validity. Journal of Personnel Psychology, 28, 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35(6), 382–385. https://doi.org/10.1097/00006199-198611000-00017
Martín A. A. & Álvarez, H. M. (2019). Multi-rater delta: extension to many raters of the measure delta of nominal agreement. https://arxiv.org/ftp/arxiv/papers/1909/1909.05575.pdf
Martín A. A. & Femia, P. (2004). Delta: A new measure of agreement between two raters. British Journal of Mathematical and Statistical Psychology, 57(Pt 1), 1–19. https://doi.org/10.1348/000711004849268
McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochem Medica, 22(3), 276–282. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900052/
O'brien, R. M. (1995). Generalizability coefficients are reliability coefficients. Quality & Quantity, 29, 421–428. https://doi.org/10.1007/BF01106066
Osborn, J. W. (Ed.). (2008). Best Practices in Qualitative Methods. Sage. https://books.google.co.th/books?id=M5_FCgCuwFgC&pg=PA35&lpg=PA35&dq=Krippendorff’s+Alpha,+advantages,+disadvantages&source=bl&ots=SwoektNeeE&sig=ACfU3U3zRzvAtSWStHpCF9ypi82q2F7tkw&hl=th&sa=X&ved=2ahUKEwinw8qQ7_XlAhXhzTgGHV3gBP4Q6AEwEXoECAkQAQ#v=onepage&q=Krippendorff’s%20Alpha%2C%20advantages%2C%20disadvantages&f=false
Polit, D. F., & Beck, C. T. (2006). The Content validity index: are you sure you know what's being reported? Critique and recommendations. Journal of Research Nurse Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147
Polit, D.F., Beck, C.T. & Owen, S.V. (2007). Is the CVI an acceptable indicator of content validity? appraisal and recommendations. Journal of Research Nurse Health, 30(4). https://doi.org/10.1002/nur.20147
Professional Testing. (2006). Test validity. http://www.proftesting.com/test_topics/pdfs/test_quality_validity.pdf.
PTI. (2006). Test Validity. Professional. https://proftesting.com/test_topics/pdfs/test_quality_validity.pdf
Rovinelli, R. J., & Hambleton, R. K. (1976). On the use of content specialists in the assessment of criterion-referenced test item validity. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco. https://files.eric.ed.gov/fulltext/ED121845.pdf
Rovinelli, R. J., & Hambleton, R. K. (1977). On the use of content specialists in the assessment of criterion-referenced test item validity. Tijdschrift Voor Onderwijs Research, 2, 49-60.
Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3), 321–325. https://doi.org/10.1086/266577
Shavelson, R. J., & Webb, N.M. (2005). Generalizability theory. https://web.stanford.edu/dept/SUSE/SEAL/Reports_Papers/methods_papers/G%20Theory%20AERA.pdf
Shuttleworth, M. (2009). Content validity, explorable. https://explorable.com/content-validity
Sireci, S. G. (1998). Gathering and analyzing content validity data. Educational Assessment, 5(4), 299–321. https://doi.org/10.1207/s15326977ea0504_2
Syed, M., & Nelson, S. C. (2015). Guidelines for establishing reliability when coding narrative data. Emerging Adulthood, 3(6). https://doi.org/10.1177/2167696815587648
Tang, W., Hu, J., Zhange, H., Wu, P., & He, H. (2015). Kappa coefficient: a popular measure of rater agreement. Shanghai Archives of Psychiatry, 27(1), 62–67. https://www.researchgate.net/publication/274727961_Kappa_coefficient_a_popular_measure_of_rater_agreement
Turner, R. C. & Carlson, L. (2003). Indexes of iItem-objective congruence for multidimensional items. International Journal of Testing, 3(2), 163-17. https://www.tandfonline.com/doi/abs/10.1207/S15327574IJT0302_5
Turner, R. C., Mulvenon, S. W., Thomas, S. P., & Balkin, R. S. (2002). Computing indices of item congruence for test development validity assessments. https://support.sas.com/resources/papers/proceedings/proceedings/sugi27/p255-27.pdf
Wongpakaran, N., Wongpakaran, T., Wedding, D., & Gwet, K. L. (2013). A Comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: A study conducted with personality disorder samples. BMC Medical Research Methodology, 13(1), https://doi.org/10.1186/1471-2288-13-61
Xie, Q. (2013). Agree or disagree? A demonstration of an alternative statistic to Cohen’s Kappa for measuring the extent and reliability of agreement between observers. https://s3.amazonaws.com/sitesusa/wp-content/uploads/sites/242/2014/05/J4_Xie_2013FCSM.pdf
Yelboga, A. (2011). Investigation of generalizability theory analysis results with different statistical programs. Poster presented at the XII. European Congress of Psychology, Istanbul, Turkey.
Zaiontz, C. (2019). Real statistics using excel. http://www.real-statistics.com
Zamanzadeh, V., Ghahramanian, A., Rassouli, M., Abbaszadeh, A., Alavi-Majd, H., & Nikanfar, A. R. (2015). Design and implementation content validity study: development of an instrument for measuring patient-centered communication. Journal of Caring Science, 4(2), 165–167. https://doi.org/10.15171/jcs.2015.017
Downloads
เผยแพร่แล้ว
How to Cite
ฉบับ
บท
License
Copyright (c) 2024 วารสารภาษาปริทัศน์
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.