Makers of Kerala

@makersofkerala

Namuk Parichayapedam

Vaaku2Vec

State-of-the-Art Language Modelling and Text Classification in Malayalam Language

Choose your language:

Enthanu Vaaku2Vec?

Vaaku2Vec Icon
Vaaku2Vec Logo

Vaaku2Vec ennathu bhaasha mathruka nirmaanathinum vachana vibhajanathinum upayogikkavunna oru vaaku embedding library aanu.

Vachanavibha... enth?

Word to Vec
Vaakukal vector formatilekk aakunnu

Vaaku embedding ennal krithrima bhudishakti (artificial intelligence) undakkunna reethikalil onnanu. Vaakukal upayogichitulla paschchathalangal padichathinu shesham ee arivu ganithathile vector enna roopa maathrukayil computer-inu sugamamaayi manasilaakavunna roopathil aakunnu. Vaakukalude paschchathalathinu purame mattu chila swabhava ghadakangalum ithinu sahaayikkuvanayi upayogikkarund. Udaharanathinu:

“Richu air gun unbox cheythu” Enna vaakyathil “air gun” enna padham Richuvinum unbox ennathinum idayil varunna vivaram computer ee vachanam vaayichathinu shesham bhaavi pravarthanangalk aayi sookshikunnu.

Word to Vec
Vaakukal scan cheyunna reethi

Ath ithiri katti aayipoyi, ennalum enthokkeyo pidikitti ennu thonunnu. Ithevideyanu upayogikunnath?

Ingane labhicha vector data pala reethiyilum upayogikkavunnathanu. Amazon website-il nammal search cheyunna vasthukkalodu saamyamulla vasthukkale haajaraakan ee reethy upakarikkum. Nammude smartphone-ukalil kaanunna Siri, Alexa thudangi nammude keyboard suggestions-il adutha vaaku ethanennu kandupidikunnidath vare ith application kandethiyitund.

Autocorrect Poli Saanam
autocorrect

Ithellam vaayikkumbol Google search cheyumbol ithinte upayogam undo ennu ningalude chintha poyenkil ningal sheriyaaya reethiyil thanneyaanu chinthichath. Ithinte uthbhavam thanne Google-il aanu.

Ith kollallo! Aara ith undakiyath?

Word2Vec original paper
word2vec original paper

Aashcharyaleshamanye Google Labs-ile research-il ninnu thanneyaanu ee product-inte uthbhavam. Thomas Mikolov-um team-um chernulla 2013-le paper-ilaanu ee saankethika reethy aadyamaayi avatharikkapedunnath. Ithanu aa paper: Distributed Representations of Words and Phrases and their Compositionality (2013)

Ee blog post-il prathipaadikkunna Vaaku2vec aavatte Kamal K Raj, Adam Shamsudeen ennivar chernnu vikasipich eduthathaanu. Kamal-um Adam Shamsudeen-um IndicNLP yude ankangalaanu. 2019 adyaamaayanu ithinte uthbhavam.

Mmade puligal
’mmade pulikal

Alla appo ee Word2Vec ullapol enthina Vaaku2Vec?

Ithinte Github repo-il parayunnathu pole Malayalam inflections-um agglutinations-um ulla bhaashayaanu. Athayath:

ഇത് (this) + ആണ്‌ (is) ennullath Malayalathil ഇതാണ് (this is) enn aaki maatamallo.

Ithinoth pravarthikkanayi ee algorithangale chitta peduthendathu aavashyamaanu. Ee joliyaanu Kamal-um Shamsudeen-um nirvahichitullath. Ithil upari ee algorithangale pala Malayala vivarashekharangalilum payatti theliyikkukayum (text classification) ivar cheythitund.

Adipoli, appo ithevidunnu kittum?

Ith Github-il ninnum labhyamaanu

pinne ithinte oru demo ee website-il und.

Vaaku2Vec app
vaak2vec ഡെമോ

Ith njan download cheythu. Ini enth cheyyanam?

Adya padi ithine patti nalla graahyamundakkukayaanu. Athinaayi njngal ee blog post ezhutan paryanaveshanam nadathiyappol kittiya oru link panku veykukkayaanu:

Illustrated word2vec
Illustrated word2vec website
Illustrated Word2Vec

Ith manasilaaki kazhinjaal ningalku puthiya aashayangal manassil theliyukayaanenkil ava pinthudarukayo allenkil ee project-inte TODO section-il ezhuthiyitulla aethenkilum karthavyam poorthiyaakukayo cheyyam.

The Manglish version was contributed by Sreeram Venkitesh

Itharathilulla vaarthakal udanadi ariyaan Maker Broadcast subscribe cheyyuka