Vaaku2Vec is a word embedding library used for language modelling and text classification.
Word embedding is a technique for creating artificial intelligence. By studying the context in which the words are used, it quantifies this information and reduces it down to vector data. Along with the context, other parameters are also included in generating the data structures.
In the sentence: “റിച്ചു എയർ ഗൺ അൺബോക്സ് ചെയ്തു” the word “എയർ ഗൺ” comes between റിച്ചു and അൺബോക്സ്. This information is used by the computer for future use cases.
Data thus acquired finds their use in various ways. When browsing Amazon, the related items that are presented in the bottom is produced in this manner. Voice assistants such as Siri, Alexa, the next word suggestions on your smartphone’s keyboard are all places where Word2Vec have found their application.
On reading this, if Google search came to your mind, your thought is in the correct lane. Google labs is where this originated.
Unsurprisingly, Google labs research is where it originated. Tomas Mikolov and team wrote this paper in 2013. This is the paper: Distributed Representations of Words and Phrases and their Compositionality (2013)
Vaaku2Vec addressed in this blogpost is built by Kamal K Raj and Adam Shamsudeen. They are members of IndicNLP. 2019 is when this paper was written.
As it says in the Github repo of Vaaku2Vec, Malayalam is a highly inflective and agglutinative language. That is:
ഇത് (this) + ആണ് (is) in Malayalam is: ഇതാണ് (this is)
In order to work in accordance with it, it is important that the algorithms are restructured. This is why Vaaku2Vec exists. Moreover, Vaaku2Vec has been tested on a good amount of datasets in order to improve the accuracy of text classification algorithms.
And a demo is available hereb.
First step is to understand what is happening well. For this, we share a really good article we read explaining Word2Vec:Illustrated Word2Vec
Once you understand this, if you get novel ideas, pursue them or you can also contribute to any of the tasks mentioned in the project'sTODO section.
ഇത്തരത്തിലുള്ള വാർത്തകൾ ഉടനടി അറിയാൻ മേക്കർ ബ്രോഡ്കാസ്റ്റ് സബ്സ്ക്ക്രൈബ് ചെയ്യുക