Makers of Kerala

@makersofkerala

Lets meet

Vaaku2Vec

State-of-the-Art Language Modelling and Text Classification in Malayalam Language

Choose your language:

What is Vaaku2Vec?

Vaaku2Vec Icon
Vaaku2Vec Logo

Vaaku2Vec is a word embedding library used for language modelling and text classification.

Word Emb... say what?

Word to Vec
Converting words to vector

Word embedding is a technique for creating artificial intelligence. By studying the context in which the words are used, it quantifies this information and reduces it down to vector data. Along with the context, other parameters are also included in generating the data structures.

In the sentence: “റിച്ചു എയർ ഗൺ അൺബോക്സ് ചെയ്തു” the word “എയർ ഗൺ” comes between റിച്ചു and അൺബോക്‌സ്. This information is used by the computer for future use cases.

Word to Vec
Scanning the words in a sentence

Okay... that was complex, but I think I get a hang of it. Where do we use this?

Data thus acquired finds their use in various ways. When browsing Amazon, the related items that are presented in the bottom is produced in this manner. Voice assistants such as Siri, Alexa, the next word suggestions on your smartphone’s keyboard are all places where Word2Vec have found their application.

Autocorrect Poli Saanam
Autocorrect

On reading this, if Google search came to your mind, your thought is in the correct lane. Google labs is where this originated.

Neat! Who built it?

Word2Vec original paper
Word2vec original paper

Unsurprisingly, Google labs research is where it originated. Tomas Mikolov and team wrote this paper in 2013. This is the paper: Distributed Representations of Words and Phrases and their Compositionality (2013)

Vaaku2Vec addressed in this blogpost is built by Kamal K Raj and Adam Shamsudeen. They are members of IndicNLP. 2019 is when this paper was written.

Mmade puligal
’മ്മടെ പുലികൾ

Wait, so if there's Word2Vec why Vaaku2Vec?

As it says in the Github repo of Vaaku2Vec, Malayalam is a highly inflective and agglutinative language. That is:

ഇത് (this) + ആണ്‌ (is) in Malayalam is: ഇതാണ് (this is)

In order to work in accordance with it, it is important that the algorithms are restructured. This is why Vaaku2Vec exists. Moreover, Vaaku2Vec has been tested on a good amount of datasets in order to improve the accuracy of text classification algorithms.

Awesome, so where do we download this?

Its available from Github.

And a demo is available hereb.

Vaaku2Vec app
Vaak2vec demo

I downloaded it. What next?

First step is to understand what is happening well. For this, we share a really good article we read explaining Word2Vec:

Illustrated word2vec
Illustrated word2vec website
Illustrated Word2Vec

Once you understand this, if you get novel ideas, pursue them or you can also contribute to any of the tasks mentioned in the project'sTODO section.

ഇത്തരത്തിലുള്ള വാർത്തകൾ ഉടനടി അറിയാൻ മേക്കർ ബ്രോഡ്കാസ്റ്റ് സബ്സ്ക്ക്രൈബ് ചെയ്യുക