Tools

    • Thamizhi Preprocessor - A preprocessing script that helps validate a given Tamil word according to the widely used Nannool grammar. This script also supports normalizing Unicode points.
    • ThamizhiLIP - A Tamil linguistic information processing tool that facilitates POS and morphological tagging using various tagsets and syntactic parsing using the Universal Framework. (Note: needs an update)
    • ThamizhiMorph - A Finite-State Transducer-based morphological analyzer for simple verbs in Tamil. Under development.
    • Thamizhi UD Parser - A Stanza and uuparser-based UD parser for Tamil. (Note: needs an update)
    • Noun Classifier - A script that classifies nouns according to the popular nominal paradigm proposed by Prof. S. Rajendran.

Treebanks and Grammars

    • Tamil Modern Written Treebank - A Universal Dependency-based treebank (in collaboration with Parameswari Krishna).
    • Sinhala Dependency Treebank - A Universal Dependency-based treebank (in collaboration with Chamila Liyanage).
    • Tamil LFG Grammar - A Lexical Functional Grammar-based parser and treebank with limited scope.
    • Corpus of Jaffna Tamil
    • Intent classification dataset (Tamil) for the e-commerce domain
Kengatharaiyer Sarveswaran


sarves at univ.jfn.ac.lk

Department of Computer Science
University of Jaffna
Sri Lanka

Plain Academic