Netting among Words

                          

A presentation on bringing Indian languages on the global map through making computational translation easier was held. This presentation encouraged a bit of research that revealed more about WordNets in general and the Hindi WordNet in particular. Simply put, WordNets are vocabulary databases. Based on the concept of the English WordNet, the Hindi WordNet is a system for bringing together different lexical and semantic relations between Hindi words. The ultimate goal of Wordnet is to capture words of all Indian languages, facilitate translations and help in creating better Indian language search engines. 

In the Hindi WordNet, words are grouped together according to the similarity in their meanings. For each word, there is a synonym set, or 'synset', representing one lexical concept. This is done to remove ambiguity in cases where a single word has multiple meanings. Synsets are the basic building blocks of WordNets. Each synset in the Hindi WordNet is linked with other synsets, through the well-known lexical and semantic relations. Semantic relations are between synsets and lexical relations are between words. These relations serve to organise the lexical knowledge base.

Domains of tourism, health and agriculture benefit from it as people over the world gain easy access to information about India. Seizing this opportunity, the Government of India has initiated a nationwide consortium of projects on Machine Translation (MT) and Cross Lingual Search (CLIR). Hindi WordNet is used heavily in Cross Lingual Search involving Indian languages.

Hindi WordNet is fast becoming a helpful resource for language teaching and pedagogy. Teachers of Hindi at the school level are already using the Hindi WordNet extensively. It is the base resource used by many researchers for work on translation and summarisation. WordNet also functions as an online thesaurus which helps writers and journalists. Only a WordNet stores culture-specific words and terms with their explanations that cannot be translated so easily into other languages.

         

 

 

         Benefits of the Hindi WordNet

           -         captures words of the Hindi language

 

     -         helpful resource for language teaching and

           pedagogy, for researchers and scholars

 

     -         important in creation of Hindi language search

           engines and cross lingual search engines

 

     -         pivotal in leading to linked WordNets of other

           Indian languages

 

     -         a significant step in machine translation

 

The Hindi WordNet is accessed daily by thousands of people. It has triggered work on linked WordNets for many Indian languages. IIT Bombay is also leading the national effort on creation of concept-based multilingual dictionaries in 13 languages, based on Hindi WordNet. This is a significant step towards development of automatic translation systems, which is currently estimated to be a multi-billion dollar industry. 

Major search engine companies have acquired the commercial license of Hindi WordNet, which essentially signifies that soon, if you put a query in Google in Hindi, it will draw upon all the related documents in English or other languages that you request. The Hindi WordNet can be freely downloaded with Application Programming Interface under General Public License from Linguistics Data Consortium (LDC), University of Pennsylvania, which is one of the topmost linguistic data repositories in the world and also from LDC-IL, the Linguistics Database Consortium of India. The project team was awarded the P. K. Patwardhan Award of IIT Bombay in 2008. An International Global WordNet Conference will be held in IIT Bombay from 31st January to 4th February, 2010.  

 
The initiative has won 'Manthan Award 2009' (further information on: www.manthanaward.org)


  Contact:
Prof. Pushpak Bhattacharyya, pb@cse.iitb.ac.in
Mr. Prabhakar Pande, pande@cse.iitb.ac.in
Mrs. Laxmi Kashyap, yupu@cse.iitb.ac.in
Mr. Salil Joshi, salilj@cse.iitb.ac.in


URL:
http://www.cfilt.iitb.ac.in/wordnet/webhwn