vi_spacy contains a Vietnamese language model for spaCy, developed by Dr. Tran Viet Trung, NLP research group. The model is trained on 18 GB of Vietnamese news, results in 270K unique vectors (300 dimensions). Trained pipeline: pre-trained word vectors, tok2vec, tagger, parser.

The source code is available here: https://github.com/trungtv/vi_spacy

Installation

  1. Download vivi model directly using pip:
pip install https://github.com/trungtv/vi_spacy/raw/master/packages/vi_spacy_model-0.2.1/dist/vi_spacy_model-0.2.1.tar.gz
  1. You may need to install pyvi
    pip install pyvi

Usage: import as module

import spacynlp = spacy.load('vi_spacy_model')doc = nlp('Cộng đồng xử lý ngôn ngữ tự nhiên')for token in doc:print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)