vi_spacy contains a Vietnamese language model for spaCy, developed by Dr. Tran Viet Trung, NLP research group. The model is trained on 18 GB of Vietnamese news, results in 270K unique vectors (300 dimensions). Trained pipeline: pre-trained word vectors, tok2vec, tagger, parser.
The source code is available here: https://github.com/trungtv/vi_spacy
Installation
- Download vivi model directly using pip:
pip install https://github.com/trungtv/vi_spacy/raw/master/packages/vi_spacy_model-0.2.1/dist/vi_spacy_model-0.2.1.tar.gz
- You may need to install pyvi
pip install pyvi
Usage: import as module
import spacynlp = spacy.load('vi_spacy_model')doc = nlp('Cộng đồng xử lý ngôn ngữ tự nhiên')for token in doc:print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)