The mode of communication like gesture, eye contact, sign language, written communication and vocal communication were being used by human beings for sharing thoughts and information to other people. The vocal communication has being one of the comfortable modes of communication, speech has been used for computer human interface. The automatic speech recognition has been the active field of research for more than five decades. The Tamil speech recognition has its own challenge, because of the large character set, high grammatical rules, and varied accents. Many research works have been carried out in Tamil speech recognition in the fields like varied recognition units, segmentation of the speech into subword units, designing the language model and designing the decoder etc. The Tamil speech recognition based research needs more enrichment as the English language. The size of the dataset of the word based speech recognizer will be more as the recognition unit is the whole word itself. Data mining based classification of the dataset to improve the performance of the speech recognizer is proposed in this paper.