isiXhosa Text analysis
An important aspect of getting children to learn to read with meaning is giving them text that is suitable for their level of reading. There are many factors that must be considered when assessing the readability of the text for the learner. Some of which include the contextual framework, the frequency of use of the words in the text, as well as the text difficulty. Teachers can use their skills of discernment to choose the contextually appropriate text for their learners. It is also possible to equip teachers with tools that identify the readability of various texts.
As a response to this need, the research team at Funda Wande (David Carel, Nangamso Mtsatse, and Nwabisa Makaluza) are working on developing an isiXhosa readability and text difficulty index by using a text corpus. This text corpus has been compiled from a collection of 141 stories that are written in isiXhosa. To give an idea of the scope of the stories that have been collected, there are about 250 isiXhosa stories at the Grade R – 3 appropriate level that are currently being published. Because there is no lexile program that has been developed to suite the isiXhosa language (agglutinative structure and transparent orthography) to date, we have chosen to use the R programming language. Here our researchers have written code in such a way that it “teaches R to read isiXhosa
From our analysis we shall produce a list of high frequency: letters, phonemes, syllables, lemmas, and words. Word frequency is an important variable for developing an isiXhosa readability index along with word length, sentence length, the semantic, and syntax of the language. Our tentative results show that ukuba, watsho, and kodwa are the three most used full words from the isiXhosa stories in the corpus. Other frequency analyses show that: a, e, i, and n are the most used letters; and /l/, /k/, and /n/ are the most used phonememes. These results may change as we modify the corpus so that it is a clean and comprehensive representation of Grade R – 3 text.
Our approach in developing the isiXhosa text difficulty index has been to include variables that influence the readability of text. While we have tried to include as much of the variables in the image on the right, we are still in the process of correctly identifying some of the elements. For example, morphemes like baleka (run): words can be uyabaleka (you are running), ubalekile (s/he has run away), and siyabaleka (we are running).
Our preliminary text difficulty index is made up of five variables: the number of letters, words, and sentences, we also include the average syllables per word, and the words per sentence. The weights displayed on the box below are tentative and are likely to change as we learn more about the type of weighting that is suitable for isiXhosa text. Once the stable weights and variables for the text difficulty index have been found, we shall verify the index with children’s experience of the different texts that have been used in assessments.