vi_spacy: A Vietnamese language model for spaCy

vi_spacy contains a Vietnamese language model for spaCy, developed by Dr. Tran Viet Trung, NLP research group. The model is trained on 18 GB of Vietnamese news, results in 270K unique vectors (300 dimensions). Trained pipeline: pre-trained word vectors, tok2vec, tagger, parser.

The source code is available here: https://github.com/trungtv/vi_spacy

Installation

Download vivi model directly using pip:

pip install https://github.com/trungtv/vi_spacy/raw/master/packages/vi_spacy_model-0.2.1/dist/vi_spacy_model-0.2.1.tar.gz

You may need to install pyvi
```
pip install pyvi
```

Usage: import as module

import spacynlp = spacy.load('vi_spacy_model')doc = nlp('Cộng đồng xử lý ngôn ngữ tự nhiên')for token in doc:print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)

Installation

Usage: import as module

Related Tools and Resources

pyvi: A Python Vietnamese Toolkit

News and Seminars

R&D