pyvi is a Python Vietnamese toolkit developed by Dr. Tran Viet Trung, NLP research group. pyvi uses the Conditional Random Field algorithm and achieves F1_score = 0.985 and 0.925 for Vietnamese tokenizer and Vietnamese pos tagging, respectively.
Main functionalities of pyvi include:
- Tokenization
- POS tagging
- Accents removal
- Accents adding
Installation
pip install pyvi
Usage
from pyvi import ViTokenizer, ViPosTaggerViTokenizer.tokenize(u"Trường đại học bách khoa hà nội")ViPosTagger.postagging(ViTokenizer.tokenize(u"Trường đại học Bách Khoa Hà Nội")from pyvi import ViUtilsViUtils.remove_accents(u"Trường đại học bách khoa hà nội")from pyvi import ViUtilsViUtils.add_accents(u'truong dai hoc bach khoa ha noi')