นานาวิธีในการหาความตรงเชิงเนื้อหาของเครื่องมือวิจัยเชิงปริมาณและการเลือกใช้วิธีที่เหมาะสม

สุพัฒน์ สุกมลสันต์

doi:10.58837/CHULA.PPJ.39.5

ผู้แต่ง

สุพัฒน์ สุกมลสันต์ ศูนย์ภาษา วิทยาลัยนานาชาติ มหาวิทยาลัยแม่โจ้ เชียงใหม่

DOI:

https://doi.org/10.58837/CHULA.PPJ.39.5

คำสำคัญ:

ความตรงเชิงเนื้อหา, การเลือกความตรงเชิงเนื้อหา, ประเภทของความตรงเชิงเนื้อหา, ความตรงเชิงเนื้อหาชนิดใหม่

บทคัดย่อ

ความตรงเชิงเนื้อหาเป็นคุณลักษณะที่สำคัญที่สุดของบททดสอบและเครื่องมือวิจัยประเภทต่างๆ (PTI, 2006) เป็นดัชนีที่นักวิจัยได้รับการคาดหวังให้รายงานอย่างชัดเจนในรายงานการวิจัยหรือบทความของตน มิฉะนั้นงานอาจดูน่าสงสัยและไม่น่าไว้วางใจ มีหลายวิธีในการคำนวณหาดัชนีดังกล่าว ขึ้นอยู่กับลักษณะทั่วไปของข้อมูล ประเภทของมาตรวัด และจำนวนผู้เชี่ยวชาญในสาขา (ผู้ประเมิน) บทความนี้นำเสนอ 15 วิธีในการหาดัชนีประเภทต่างๆ ที่กล่าวถึง รวมถึงข้อดีและข้อเสีย และเสนอแนะแนวทางการใช้วิธีที่เหมาะสมให้นักวิจัยนำไปใช้ในการศึกษาวิจัย

References

จักรกฤษณ์ สำราญใจ. (2554). IOC = ความตรง?. วารสารหลักสูตรและการเรียนการสอน. 4(1–2), มหาวิทยาลัยขอนแก่น. https://www.scribd.com/doc/86608731/IOC.

พิศิษฐ ตัณฑวณิช และพนา จินดาศรี. (2561). ความหมายที่แท้จริงของค่า IOC. วารสารการวัดผลการศึกษา มหาวิทยาลัยมหาสารคาม, 24(2), 3–12. https://so02.tci-thaijo.org/index.php/jemmsu/article/view/174521/124950

ล้วน สายยศ และอังคณา สายยศ. (2539). หลักการสร้างแบบทดสอบความถนัดทางการเรียน. วัฒนาพานิช.

เยาวดี รางชัยกุล วิบูลย์ศรี. (2556). การวัดผลและการสร้างแบบสอบผลสัมฤทธิ์. สำนักพิมพ์แห่งจุฬาลงกรณ์มหาวิทยาลัย.

Abbott, R. D. & Perkin, D. (1982). Reliability and validity evidence for scale measuring dimensions of student ratings of instruction, Journal of Educational and Psychological Measurement, 42(2), 563–569. https://doi.org/10.1177/001316448204200220

Andres, A. M. & Marzo, P. F. (2004). Delta: A new measure of agreement between two raters. British Journal of Mathematical and Statistical Psychology, 57, 1–19. https://doi.org/10.1348/000711004849268

Ato, M., López, J. J., & Benavente, A. M. (2011). A simulation study of rater agreement measures with 2x2 contingency tables. Psicológica, 32(2), 385–402. https://www.uv.es/psicologica/articulos2.11/12ATO.pdf

Ayre, C. & Scally, A. J. (2014). Critical values for Lawshe’s content validity ratio: revisiting the original methods of calculation. Journal of Measurement and Evaluation in Counseling and Development, 47(1), 79–86. https://doi.org/10.1177/0748175613513808

Cherry, K. (2017). Why Validity is Important to Psychological Tests. Verywellmind. Retrieved from https://www.verywellmind.com/what-is-validity-2795788

Cherry, K. (2023). Validity in psychological tests: why measures like validity and reliability are important. https://www.verywellmind.com/what-is-validity-2795788

Choudhury, A. (2018). Top 4 characteristics of a good test. http://www.yourarticlelibrary.com/education/test/top-4-characteristics-of-a-good-test/64804

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104

Cronbach, L. J. & Meehl, P. E. (1955). Construct validity in psychological tests. Journal of Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957

Disha, M. (2018). Validity of a test: 6 types | statistics. http://www.yourarticlelibrary.com/statistics-2/validity-of-a-test-6-types-statistics/92597

Ebel, R. L. (1972). Essential educational measurements. Prentice Hall.

Fitzpatrick, A. R. (1983). The Meaning of content validity. Applied Psychological Measurement, 7(1), 3–13. https://doi.org/10.1177/014662168300700102

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378-382. https://doi.org/10.1037/h0031619

Garrett, H. E. (1964). Testing for teachers. American Book Company.

Girard, J. M. (2022). Scott's pi coefficient. https://github.com/jmgirard/mReliability/wiki/Scott%27s-pi-coefficient.

Goodwin, L. D. (2001). Interrater Agreement and Reliability. Measurement in Physical Education and Exercise Science, 5(1), 13–34. https://doi.org/10.1207/S15327841MPEE0501_2

Gwet, K. (2002). Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Statistical Methods for Inter-Rater Reliability Assessment Series, 2, 1–9. http://www.agreestat.com/research_papers/inter_rater_reliability_dependency.pdf

Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1). https://doi: 10.1348/000711006X126600. PMID: 18482474

Haley, D. T., Thomas, P., Petre, M., & Roeck, A. D. (2008). Using a new inter-rater reliability statistic. Technical Report No. 2008/15. https://pdfs.semanticscholar.org/765d/f9d90295d5ca2b59e5092c4a5f7a09668d23.pdf

Haynes. S. N., Richard, D. C. S. & Kubany, E. S. (1995). Content validity in psychological assessment: A Functional approach to concepts and methods. Psychological Assessment, 7(3), 238–247. https://doi.org/10.1037/1040-3590.7.3.238

Hughes, A. (1995). Testing for Language Teachers. Bell & Bain, Ltd.

Kleeman, J. (2018). Six tips to increase content validity in competence tests and exams. https://www.questionmark.com/resources/blog/six-tips-to-increase-reliability-in-competence-tests-and-exams/

Krippendorff, K. (2011). Computing Krippendorff 's Alpha-Reliability. https://www.statisticshowto.datasciencecentral.com/wp-content/uploads/2016/07/fulltext.pdf

Lado, R. (1975). Language testing. Wing Tasi Cheung Printing.

Laerd Statistics. (2019). Fleiss' kappa using SPSS Statistics. Statistical tutorials and software guides. https://statistics.laerd.com/spss-tutorials/fleiss-kappa-in-spss-statistics.php

Laerd Research. (2018). Content validity. http://dissertation.laerd.com/content-validity.php

Lawshe, C. H. (1975). A Quantitative approach to content validity. Journal of Personnel Psychology, 28, 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x

Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35(6), 382–385. https://doi.org/10.1097/00006199-198611000-00017

Martín A. A. & Álvarez, H. M. (2019). Multi-rater delta: extension to many raters of the measure delta of nominal agreement. https://arxiv.org/ftp/arxiv/papers/1909/1909.05575.pdf

Martín A. A. & Femia, P. (2004). Delta: A new measure of agreement between two raters. British Journal of Mathematical and Statistical Psychology, 57(Pt 1), 1–19. https://doi.org/10.1348/000711004849268

McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochem Medica, 22(3), 276–282. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900052/

O'brien, R. M. (1995). Generalizability coefficients are reliability coefficients. Quality & Quantity, 29, 421–428. https://doi.org/10.1007/BF01106066

Osborn, J. W. (Ed.). (2008). Best Practices in Qualitative Methods. Sage. https://books.google.co.th/books?id=M5_FCgCuwFgC&pg=PA35&lpg=PA35&dq=Krippendorff’s+Alpha,+advantages,+disadvantages&source=bl&ots=SwoektNeeE&sig=ACfU3U3zRzvAtSWStHpCF9ypi82q2F7tkw&hl=th&sa=X&ved=2ahUKEwinw8qQ7_XlAhXhzTgGHV3gBP4Q6AEwEXoECAkQAQ#v=onepage&q=Krippendorff’s%20Alpha%2C%20advantages%2C%20disadvantages&f=false

Polit, D. F., & Beck, C. T. (2006). The Content validity index: are you sure you know what's being reported? Critique and recommendations. Journal of Research Nurse Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147

Polit, D.F., Beck, C.T. & Owen, S.V. (2007). Is the CVI an acceptable indicator of content validity? appraisal and recommendations. Journal of Research Nurse Health, 30(4). https://doi.org/10.1002/nur.20147

Professional Testing. (2006). Test validity. http://www.proftesting.com/test_topics/pdfs/test_quality_validity.pdf.

PTI. (2006). Test Validity. Professional. https://proftesting.com/test_topics/pdfs/test_quality_validity.pdf

Rovinelli, R. J., & Hambleton, R. K. (1976). On the use of content specialists in the assessment of criterion-referenced test item validity. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco. https://files.eric.ed.gov/fulltext/ED121845.pdf

Rovinelli, R. J., & Hambleton, R. K. (1977). On the use of content specialists in the assessment of criterion-referenced test item validity. Tijdschrift Voor Onderwijs Research, 2, 49-60.

Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3), 321–325. https://doi.org/10.1086/266577

Shavelson, R. J., & Webb, N.M. (2005). Generalizability theory. https://web.stanford.edu/dept/SUSE/SEAL/Reports_Papers/methods_papers/G%20Theory%20AERA.pdf

Shuttleworth, M. (2009). Content validity, explorable. https://explorable.com/content-validity

Sireci, S. G. (1998). Gathering and analyzing content validity data. Educational Assessment, 5(4), 299–321. https://doi.org/10.1207/s15326977ea0504_2

Syed, M., & Nelson, S. C. (2015). Guidelines for establishing reliability when coding narrative data. Emerging Adulthood, 3(6). https://doi.org/10.1177/2167696815587648

Tang, W., Hu, J., Zhange, H., Wu, P., & He, H. (2015). Kappa coefficient: a popular measure of rater agreement. Shanghai Archives of Psychiatry, 27(1), 62–67. https://www.researchgate.net/publication/274727961_Kappa_coefficient_a_popular_measure_of_rater_agreement

Turner, R. C. & Carlson, L. (2003). Indexes of iItem-objective congruence for multidimensional items. International Journal of Testing, 3(2), 163-17. https://www.tandfonline.com/doi/abs/10.1207/S15327574IJT0302_5

Turner, R. C., Mulvenon, S. W., Thomas, S. P., & Balkin, R. S. (2002). Computing indices of item congruence for test development validity assessments. https://support.sas.com/resources/papers/proceedings/proceedings/sugi27/p255-27.pdf

Wongpakaran, N., Wongpakaran, T., Wedding, D., & Gwet, K. L. (2013). A Comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: A study conducted with personality disorder samples. BMC Medical Research Methodology, 13(1), https://doi.org/10.1186/1471-2288-13-61

Xie, Q. (2013). Agree or disagree? A demonstration of an alternative statistic to Cohen’s Kappa for measuring the extent and reliability of agreement between observers. https://s3.amazonaws.com/sitesusa/wp-content/uploads/sites/242/2014/05/J4_Xie_2013FCSM.pdf

Yelboga, A. (2011). Investigation of generalizability theory analysis results with different statistical programs. Poster presented at the XII. European Congress of Psychology, Istanbul, Turkey.

Zaiontz, C. (2019). Real statistics using excel. http://www.real-statistics.com

Zamanzadeh, V., Ghahramanian, A., Rassouli, M., Abbaszadeh, A., Alavi-Majd, H., & Nikanfar, A. R. (2015). Design and implementation content validity study: development of an instrument for measuring patient-centered communication. Journal of Caring Science, 4(2), 165–167. https://doi.org/10.15171/jcs.2015.017

นานาวิธีในการหาความตรงเชิงเนื้อหาของเครื่องมือวิจัยเชิงปริมาณและการเลือกใช้วิธีที่เหมาะสม

ผู้แต่ง

DOI:

คำสำคัญ:

บทคัดย่อ

References

Downloads

เผยแพร่แล้ว

How to Cite

ฉบับ

บท

License

Make a Submission

Language

Information