Introduction

Natural language processing (NLP) is a subfield of linguistics, artificial intelligence aiming at helping computers can understand human language and can interact with human.  With the rapid development of data science, NLP has a big progress in creating applications that can bring many benefits to life. Some applications of NLP are machine translation, chatbot, social media monitoring, survey analysis, targeted advertising, hiring and recruitment, voice assistants, spelling correction.

Our research group focuses on exploiting machine learning and deep learning techniques, incorporating with NLP features and other knowledges to develop high performance NLP applications. We also investigate methods to construct knowledge base and taxonomy for specific NLP tasks, and to create large datasets for training NLP tasks. See the slides here for more detail.

Contact: Assoc. Prof. Le Thanh Huong, Email: huonglt@soict.hust.edu.vn

Research Directions

Exploiting machine learning, deep learning techniques, in companied with NLP features to research and develop NLP applications in the following directions:

  • Information extraction: Several tasks are investigated including named entity recognition, relation extraction, event extraction.
  • Chatbot/question answering: Generation answers for questions based on different sources such as paragraphs, knowledge bases, databases, … Chatbot/question answering is used in many real-life applications such as customer service, study counseling, … We address different problems in this research direction including intent classification, slot tagging, question similarity, dialog management, …
  • Speech Technologies: Focusing on expressive speech synthesis, speech synthesis with state-of-the-art research, automatic speech recognition; speaker verification, speaker identification
  • Text Summarization: Summarizing single or multi-documents, either by picking up important sentences or creating new summaries with condensed content. We also look at query-based summarization, in which the answer is generated by summarizing all the documents returned by the query.
  • Sentiment analysis: Detecting positive/negative sentiment in text. Sentiment analysis is often used by businesses to detect sentiment in social data, and to understand customers.
  • Machine translation: We concentrate on several aspects: developing multilingual neural machine translation; increasing the performances (accuracy, speed) of the system; dealing with low resource languages; automatically building MT corpus for training machine translation systems.
  • Plagiarism detection: Automatically identifying the copied fragments in a suspicious document from other source documents. We also concern about cross-language plagiarism detection where the source of plagiarism is in a different language.
  • Vietnamese spelling correction: Spelling and grammatical errors make input texts difficult to understand. If such documents are used for training, it leads to bad model quality. In real-world NLP problems, we often meet texts with a lot of typos. Because of that, data should be cleaned before using. We focus on correcting spelling errors in two data types: academic text and social data.

Research Problems

  • Synonym discovery from multiple sources: The project aims at discovering synonyms from multiple Web data sources. Synonyms are in form of various alias of the same entity, or equivalent representations of attribute relationships. The main sources come from user interaction with web search engines such as web search logs, semi-structured data such as web tables, and unstructured data such as web documents.
  • Weakly supervised aspect extraction: The project aims at extracting domain aspects from user-generated content which serves as an essential step in opinion mining. It tackles the bottleneck of data annotation by studying the paradigm of weak supervision empowered by neural representation and neural learning frameworks.
  • Weakly supervised taxonomy construction: A taxonomy is a scheme of classification that helps to organize and index knowledge. Generally, the development and the maintenance of a taxonomy is a labor-intensive task requiring significant resources and expertise. Our objective aims at exploring weak supervision to accelerate the process in an automated manner while keeping a minimum requirement on manual tasks.
  • Knowledge base construction from semi-structured documents: Today, our data universe is increasing exponentially and more than 70% of those data are unstructured and semi-structures (e.g. word, pdf, excel files). Those data are commonly un-touched as they are not in the right forms for data analytic software. Our objective is to develop natural language understanding methods to extract valuable information in semi-structured documents. We are then able to construct knowledge bases, which benefit further analytics and beyond.

Team Members

Assoc. Prof. Le Thanh Huong
Team Leader

Assoc. Prof. Nguyen Thi Kim Anh
Member

Dr. Nguyen Thi Thu Trang
Member

Dr. Nguyen Kiem Hieu
Member

Dr. Tran Viet Trung
Member

Post-doc and PhD Students

Ha Thi Thanh
PhD Student

Luu Minh Tuan
PhD Student

Projects and Solutions

Tools and Resources

Latest Publications

Publications in 2024

  1. Thang Duc Phan and Huong Thanh Le. Utilize Pre-Trained PhoBERT to Compute Text Similarity and Rerank Documents for Question-Answering Task. 12th International Conference on Control, Automation and Information Sciences (ICCAIS). 200-205. Hanoi. 27/11/2023
  2. Sikandar Ali Qalati, Domitilla Magni, and Faiza Siddiqui. Senior Management's Sustainability Commitment and Environmental Performance: Revealing the Role of Green Human Resource Management Practices.. Business Strategy and the Environment. 02/08/2024
  3. T. K. Lai, and I. L. Ngo. A new design and optimization of VD-ECF micro-pump: Advancements in electrohydraulic performance. Physics of Fluids. 29/07/2024
  4. T. K. Lai, and I. L. Ngo. An investigation on the thermo-electrohydraulic performance of novel ECF micro-pump.. International Journal of Heat and Mass Transfer. 29/09/2024
  5. Thi-Nhung Nguyen, Bang Tien Tran, Trong-Nghia Luu, Thien Huu Nguyen, Kiem-Hieu Nguyen. BKEE: Pioneering Event Extraction in the Vietnamese Language. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2421-2427. Torino, Italia. 20/05/2024
  6. Sikandar Ali Qalati, MengMeng Jiang, Samuel Gyedu, and Emmanuel Kwaku Manu. Do Strong Innovation Capability and Environmental Turbulence Influence the Nexus Between Customer Relationship Management and Business Performance?. Business Strategy and the Environment. 02/07/2024
  7. T. K. Lai, K. D. Tran, and I. L. Ngo. A numerical study on the thermo-electrohydrodynamic performance of ECF micro-pumps. Sustainability and Emerging Technologies for Smart Manufacturing. 29/04/2024
  8. T. K. Lai, and I. L. Ngo. An investigation on the electrohydraulic performance of novel ECF micro-pump with NACAshaped electrodes. Theoretical and Computational Fluid Dynamics. 29/02/2024
  9. Pham Viet Thanh, Ngo Thi Thu Huyen, Pham Ngoc Quan, Nguyen Thi Thu Trang. A Robust Pitch-Fusion Model for Speech Emotion Recognition in Tonal Languages. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 12386-12390. Seoul, Republic of Korea. 11/04/2024
  10. JYE Tin, WW Tan, AA Bakar, MS Mahali, FF Lothai, NF Mohammad, SSA Hassan & KF Chin. A Conceptual Design of Sustainable Solar Photovoltaic (PV) Powered Corridor Lighting System with IoT Application. ICREEM 2022. 09/03/2024

Publications in 2023

  1. Tuong Tu Huu, Viet Thanh Pham, Thi Thu Trang Nguyen, Thai Lai Dao. Mispronunciation detection and diagnosis model for tonal language, applied to Vietnamese. INTERSPEECH 2023. 1014-1018. Dublin, Ireland. 20/08/2023
  2. Pham Viet Thanh, Nguyen Xuan Thai Hoa, Hoang Long Vu, Nguyen Thi Thu Trang. Vietnam-Celeb: a large-scale dataset for Vietnamese speaker recognition. INTERSPEECH 2023. 1918-1922. 20/08/2023
  3. Trung-Kien Nguyen, Viet-Trung Tran, Huy-Anh Nguyen, Khac-Hoai Nam Bui. A Novel Method for Spam Call Detection Using Graph Convolutional Networks. Communications in Computer and Information Science. 106-116. 22/07/2023
  4. Nguyễn Đức Ca, Phan Thị Thu, Hoàng Thị Minh Anh, Phạm Ngọc Dương, Nguyễn Hoàng Giang, Nguyễn Lệ Hằng. Nâng cao hiệu quả quản trị đại học trong bối cảnh đổi mới giáo dục tại Việt Nam. Tạp chí khoa học giáo dục Việt Nam. 14/03/2023
  5. Sikandar Ali Qalati, Sonia Kumari, Kayhan Tajeddini, Namarta Kumari Bajaj, and Rajib Ali. Innocent devils: The varying impacts of trade, renewable energy and financial development on environmental damage: Nonlinearly exploring the disparity between developed and developing nations. Journal of Cleaner Production. 02/02/2023
  6. Wu Qinqin, Sikandar Ali Qalati, Rana Yassir Hussain, Hira Irshad, Kayhan Tajeddini, Faiza Siddique, Thilini Chathurika Gamage. The effects of enterprises' attention to digital economy on innovation and cost control: Evidence from A-stock market of China. Journal of Innovation & Knowledge. 02/12/2023
  7. Thai Nguyen Quoc, Huong Le Thanh, Hanh Pham Van. Khmer-Vietnamese Neural Machine Translation Improvement Using Data Augmentation Strategies. Informatica (Slovenia). 349-360. 15/07/2023
  8. Thi-Nhung Nguyen, Hoang Ngo, Kiem-Hieu Nguyen, Tuan-Dung Cao. A Self-enhancement Multitask Framework for Unsupervised Aspect Category Detection. EMNLP 2023. 8043–8054. Singapore. 06/12/2023
  9. Anh Pham Duy, Huong Le Thanh. A Question-Answering System for Vietnamese Public Administrative Services. SOICT 2023: The 12th International Symposium on Information and Communication Technology. 85–92. TP Hồ Chí Minh. 07/12/2023
  10. Sikandar Ali Qalati, Belem Barbosa, and Blend Ibrahim. Factors influencing employees’ eco-friendly innovation capabilities and behavior: the role of green culture and employees’ motivations. Environment, Development and Sustainability. 02/10/2023
  11. Quang Nhat Nguyen, Huong Thanh Le. Building an Efficient Retriever System with Limited Resources.. Advances in Information and Communication Technology. ICTA 2023. 40-50. Thái Nguyên. 13/12/2023
  12. Thi-Trang Nguyen; Xuan-Do Dao; Quoc-Quan Chu; Kiem-Hieu Nguyen. Improving Intent Detection and Slot Filling for Vietnamese. 2022 RIVF International Conference on Computing and Communication Technologies (RIVF). 118-123. Ho Chi Minh City. 20/12/2022
  13. Sikandar Ali Qalati , Belem Barbosa & Shuja Iqbal. The effect of firms’ environmentally sustainable practices on economic performance. Economic. Economic Research-Ekonomska Istraživanja. 02/06/2023
  14. Trong-Nghia Luu, Tien-Bang Tran, Minh-Hong Truong, Huu-Hiep Nguyen, Kiem-Hieu Nguyen, Thi-Thanh Ha. Aspect Extraction Based on Weakly-Supervised Learning for Vietnamese. 2023 15th International Conference on Knowledge and Systems Engineering (KSE). 1-6. 18/10/2023

Publications in 2022

  1. Viet-Trung Tran; Van-Sang Tran; Xuan-Bang Nguyen; The-Trung Tran. A liveness detection protocol based on deep visual-linguistic alignment. International Conference on Knowledge and Systems Engineering (KSE). 19/10/2022
  2. Phan Thị Thu. Những vấn đề đặt ra đối với thiết chế Hội đồng trường đại học công lập ở Việt Nam hiện nay. Kỷ yếu hội thảo khoa học trường Học viện báo chí và tuyên truyền. 14/11/2022
  3. Sikandar Ali Qalati, Sonia Kumari, Ishfaque Ahmed Soomro, Rajib Ali, Rajib Ali, Yifan Hong, and Yifan Hong. Green Supply Chain Management and Corporate Performance Among Manufacturing Firms in Pakistan. Frontiers in Environmental Science. 02/05/2022
  4. Sikandar Ali Qalati, Zuhaib Zafar, Mingyue Fan, Mónica Lorena Sánchez Limón, Muhammad Bilawal Khaskheli. Employee performance under transformational leadership and organizational citizenship behavior: a mediated model. Heliyon. 02/11/2022
  5. Phan Thị Thu. Cuộc cạnh tranh của các công ty người Việt với nước ngoài trong lĩnh vực vận tải đường thuỷ ở bắc kỳ đầu thế kỉ XX và những kinh nghiệm đối với doanh nhân Việt Nam hiện nay. Kỷ yếu hội thảo quốc gia tổ chức tại trường ĐHSP Hà Nội 2. 14/12/2022
  6. Nguyen Thi Thu Trang; Nguyen Hoang Ky. VLSP 2021 - TTS Challenge: Vietnamese Spontaneous Speech Synthesis. VNU Journal of Science: Computer Science and Communication Engineering. 37-46. 30/06/2022
  7. Thi Thu Trang NGUYEN, Trung Duc Anh Dang, Quoc Viet Vu, Woomyoung Park. Building Vietnamese Conversational Smart Home Dataset and Natural Language Understanding Model. Interspeech 2022. 5180-5184. Korea. 18/09/2022
  8. Viet-Trung Tran, Hai-Nam Cao & Tuan-Dung Cao. A Practical Method for Occupational Skills Detection in Vietnamese Job Listings. Asian Conference on Intelligent Information and Database Systems. Ho Chi Minh, Vietnam. 28/11/2022
  9. Thi-Thanh Ha, Van-Nha Nguyen, Kiem-Hieu Nguyen, Kim-Anh Nguyen, Quang-Khoat Than. Utilizing SBERT For Finding Similar Questions in Community Question Answering. 13th International Conference on Knowledge and Systems Engineering (KSE). 1-6. Bangkok, Thailand. 10/11/2021
  10. Bui Thi Mai Anh, Nguyen Thi Thu Trang, Tran Thi Dinh. A Novel Type-based Genetic Algorithm for Extractive Summarization. Thirty-Fifth International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems. 143-155. 19/07/2022
  11. C Palanichamy, WW Tan & P Naveen. A Microgrid for the Secluded Paana Theertham Kani Settlement in India. Clean Energy. 09/02/2022
  12. Vi Thanh Dat, Pham Viet Thanh, Nguyen Thi Thu Trang. VLSP 2021 - SV challenge: Vietnamese Speaker Verification in Noisy Environments. VNU Journal of Science: VNU Journal of Science: Computer Science and Communication Engineering. 05/05/2022
  13. Vinh Van Nguyen, Ha Nguyen, Huong Thanh Le, Thai Phuong Nguyen, Tan Van Bui, Luan Nghia Pham, Anh Tuan Phan, Cong Hoang-Minh Nguyen, Viet Hong Tran and Anh Huu Tran. KC4MT: A High-Quality Corpus for Multilingual Machine Translation. The 13th Language Resources and Evaluation Conference. 5494‑5502. Marseille, France. 20/06/2022
  14. Sikandar Ali Qalati, Naveed Akhtar Qureshi, Dragana Ostic, Mohammed Ali Bait Ali Sulaiman. An extension of the theory of planned behavior to understand factors influencing Pakistani households’ energy-saving intentions and behavior: a mediated–moderated model. Energy Efficiency. 02/08/2022
  15. Nguyen Thi Thu Trang, Dang Trung Duc Anh, Vu Quoc Viet and Park Woomyoung. Advanced Joint Model for Vietnamese Intent Detection and Slot Tagging. 8th EAI International Conference on Industrial Networks and Intelligent Systems (INISCOM 2022). 125-135. Danang, Vietnam. 21/04/2022
  16. Nguyen Hoang Tien Bach, Nguyen Manh Dung, Nguyen Thi Thu Trang. Machine Reading Comprehension Model for Low-Resource Languages and Experimenting on Vietnamese. 35th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems. 370–381. 19/07/2022
  17. Hanh Pham Van, Huong Le Thanh. Improving Khmer-Vietnamese Machine Translation with Data Augmentation methods. SoICT 2022: The 11th International Symposium on Information and Communication Technology. 276–282. Vietnam. 01/12/2022
  18. Hai-Nam Cao, Duc-Thai Do, Viet-Trung Tran, Tuan-Dung Cao & Young-In Song. Synonym Prediction for Vietnamese Occupational Skills. Lecture Notes in Computer Science. 351-362. 19/07/2022
  19. Tuan Anh Phan, Ngoc Dung Nguyen, Huong Le Thanh, Khac-Hoai Nam Bui. Neural Inverse Text Normalization with Numerical Recognition for Low Resource Scenarios. ACIIDS 2022: Intelligent Information and Database Systems. 582–594. Ho Chi Minh city. 28/11/2022