pyvi is a Python Vietnamese toolkit developed by Dr. Tran Viet Trung, NLP research group. pyvi uses the Conditional Random Field algorithm and achieves F1_score = 0.985 and 0.925 for Vietnamese tokenizer and Vietnamese pos tagging, respectively.

Main functionalities of pyvi include:

  • Tokenization
  • POS tagging
  • Accents removal
  • Accents adding

Installation

pip install pyvi

Usage

from pyvi import ViTokenizer, ViPosTaggerViTokenizer.tokenize(u"Trường đại học bách khoa hà nội")ViPosTagger.postagging(ViTokenizer.tokenize(u"Trường đại học Bách Khoa Hà Nội")from pyvi import ViUtilsViUtils.remove_accents(u"Trường đại học bách khoa hà nội")from pyvi import ViUtilsViUtils.add_accents(u'truong dai hoc bach khoa ha noi')