Introduction

The field of artificial intelligence is witnessing a recent explosion in research, tool development as well as software system deployment. Many software development companies are shifting their focus on developing intelligent software systems by adopting AI to their existing processes. At the same time, the academic research community is injecting AI paradigms to provide robust solutions to traditional software engineering (SE) problems and has achieved promising results. Again, AI has been proved useful to address major challenges that the SE field has been facing. Indeed, different AI paradigms (such as neural networks, machine learning, knowledge-based systems, natural language processing) can be applied to different SE phases (i.e., requirement, design, development, testing, release and maintenance) to improve the process and reduce human-effort in tedious and cumbersome tasks (e.g., testing, debugging etc.).

Our research group focuses on studying how to apply AI to resolve current challenges of SE phases during software development lifecycle. We aim to leverage AI paradigms in improving the existing processes by representing intelligence and eventually automating some SE phases such as requirement engineering, testing and debugging. We are also focusing on applying software methodology and architecture in developing AI applications. See the slides here for more detail.

Contact: Dr. Bui Thi Mai Anh, Email: anhbtm@soict.hust.edu.vn

Research Directions

  • Intelligent Requirement engineering and planning: Requirement and planning is the first stage in the SE process which allows to establish the building blocks of a software system. Software requirements describe the outlook of a software application by specifying the software’s main objectives and goals. Requirement Engineering is the engineering discipline of establishing user requirements and specifying software systems. Studies in this field aim to automatically extract requirement information from users (which is often described by natural languages) and to systematically represent software requirements (by using modeling languages or specification languages such as UML). Applying Natural Language Processing (NLP) and Machine Learning/Deep Learning are trending directions in this field. Aside from automated requirement engineering, to help project managers in effective planning and software cost estimation, the selection of software requirements and scheduling software development process are addressed by using multiobjective evolutionary algorithms (MOEAs).
  • Automated Software refactoring: Refactoring is one of the most expensive parts of software implementation. Code refactoring is intended to improve the design, structure and/or implementation of software systems while preserving their functionality. Search-based methods (using metaheuristics) and machine learning approaches are combined to automatically suggest refactoring solutions.
  • Search Based Software Testing: Software testing is one of the major targets of AI-driven methods. The testing process is a process of executing a program or application with the intent of finding software defects and bugs by using a given test suite (set of test cases). Research directions in this field aim to address the problem of how to design effective test cases to detect as maximum as possible software defects and bugs. This problem is formulated as an optimization problem: minimizing generated test cases while maximizing the detection of bugs/faults. We aim to apply search-based optimization algorithms such as EAs, GA, PSO, etc. to tackle this problem.
  • Software Defect Prediction: Software maintenance is defined by the IEEE as the modification of a software product after delivery to correct faults, to improve performance or to adapt the product to a modified environment. Mining user feedback is essential for software engineers. We aim to apply ML techniques and NLP to explore user feedback in order to automatically predict software defects, including bug localization and fault localization, and to automatically repair them.
  • Automated Program Repair using Machine Learning techniques: Along with the development of computer science, software applications are more and more popular in every domain in the real world. The growth makes the reliability of software programs becoming critical since issue-impacted programs may jeopardize digital assets. Unfortunately, the reliability of software could always be negatively affected by numerous problems, especially software bugs, which are pervasive in software systems. To reduce the tremendous damages caused by software bugs, developers should fix them in time. However, manual bug-fixing is notoriously tricky, tedious, and time-consuming. This project aims to create an automated program repair system based on machine learning techniques to reduce manual bug-fixings efforts. We first investigate the performance of learning-based automated program repair techniques on standard benchmarks such as Defects4J, Codeflaws, etc. Further, we analyze the results to identify the main challenges facing learning-based techniques. Finally, we proposed novel learning-based program repair techniques to address the challenges.
  • Vulnerabilities Detection in Machine Learning Frameworks: Smart systems are increasingly dependent on machine learning (ML) frameworks, e.g., TensorFlow, for their feature implementation. These frameworks are built on top of many third-party libraries, which depend on many others. Simply trusting and reusing a framework poses a security risk as the framework and its direct and transitive dependencies can contain exploitable vulnerabilities. To mitigate this risk, this project will create an advanced software composition analysis solution that scans dependency hierarchies and builds new deep learning architectures to analyse code and document repository data and flag vulnerabilities. Further, the flagged vulnerabilities will be verified if it can be reached via our new directed grammar-based fuzzing solution that generates valid test cases (following predefined grammars) and drives test executions to vulnerable code. Our solution targets vulnerabilities hidden deep in ML framework dependencies, which are hard for a classic fuzzer to uncover and for framework developers to recognize as they appear in third-party code.
  • Development of optimization resources solutions using Nash Equivalium and multi-objective algorithms for some applications: We will develop a multi-objective optimization model that uses the Unified Model to model game theory conflicts and resource allocation based on Nash equilibrium. We will test optimization algorithms, we use methods like: Meta-heuristic, Evolution Algorithm, Genetic Algorithm, Particle Swarm Optimization, and Ant Colony Optimization. Real-time data for problems such as resource management in software projects or freight optimization in logistics management.
  • Development of AI application:
    • Stock price prediction: Building machine learning, deep learning (convolutional neural network, natural language processing, long short-term memory), reinforcement learning to predict the direction of the stock price in the next day based on historical data.

Research Problems

  • Test case generation: Addressing the generation of effective test cases to cover as maximum as possible testing criteria by using evolutionary algorithms such as GA, PSO, GWO, etc. Combining optimization algorithm with constraint-based local search to more effectively generate test suites.
  • Requirement Mining: Using NLP techniques to examine user requirements and transform from natural language to requirement models (usecase model, activity model) then applying evolutionary algorithms to generate test cases from requirement models (from the perspective of blackbox testing).
  • Requirements Selection: Prioritizing software requirements is an optimization task. The selection of software requirements is based on the project deadline as well as on the financial budget. This motivates us to study multi-objective metaheuristic algorithms to select effectively requirements during software development process.
  • Automated Software Refactoring: Code smells are certain structures in the code that indicate the violation of fundamental design principles and negatively impact design quality. We aim to apply machine learning and deep learning techniques to detect code smells and suggest refactoring solutions by using evolutionary algorithms.
  • Bug Localization: During the development of software products, bug tracking systems (such as Bugzilla and JIRA) are used to report and manage bugs. Bug localization refers to the task of locating the potential buggy source code files in a software project given a bug report. The major challenges of bug localization come from the lexical mismatches between natural languages which are used to describe bug reports and programming languages which are used to write source files. We aim to apply deep learning models to narrow the lexical gap as well as to improve the performance of existing bug localization models. We are also focusing on addressing the imbalanced problem in this field as the number of buggy files takes a small part of source code files for a given bug report. The common approaches are based on data-driven (e.g., oversampling, undersampling) and model-driven techniques (bootstrapping, reinforcement learning, cost-sensitive learning, ensemble learning).
  • Fault Localization: While bug localization focuses more on locating buggy source files, in the fault localization problem, given the execution of test cases, an FL tool identifies the set of suspicious lines of code with their suspiciousness scores. We apply machine learning and deep learning techniques to analyze source code and detect faulty statements at the method level. We are also focusing on exploiting method features through code complexity metrics, mutation-based formula, etc. (often lead to +200 features) to evaluate the sensitivity of features as well as to examine the constraint between features to augment the performance of FL models.
  • Stock Price Prediction Problems:
    • Insufficient data
      • Using new technologies such as transfer learning, ensemble machine learning models, data augmentation, and synthetic data
    • Imbalanced data
      • Resampling training samples such as over-sampling with minority class, under-sampling with majority class, or combine two methods
    • Feature engineering
      • Finding important information from given data
    • Finding new feature to help imporve the training model performance

Team Members

Dr. Bui Thi Mai Anh
Team Leader

Assoc. Prof. Huynh Quyet Thang
Member

Dr. Nguyen Nhat Hai
Member

Dr. Tran Nhat Hoa
Member

Latest publications

Publications in 2022

  1. Nhat-Hoa Tran, Toshiaki Aoki. SSpinJa: Facilitating Schedulers in Model Checking. 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). 632-641. Hainan, China. 06/12/2021

Publications in 2021

  1. Bùi Thị Mai Anh, Nguyễn Nhất Hải. Adaptive Ranking Relevant Source Files for Bug Reports Using Genetic Algorithm. SOMET 2021. 430-443. Cacun, Mexico. 21/09/2021
  2. Bùi Thị Mai Anh, Nguyễn Thị Thu Trang. A Feature-Augmented Deep Learning Model for Extractive Summarization. INISCOM 2021. Vol 379. Le Quy Don University, Hanoi, Vietnam. 22/04/2021
  3. The-Anh Le, Quyet-Thang Huynh, Thanh-Hung Nguyen. A New Method to Improve Quality Predicting of Software Project Completion Level. Industrial Networks and Intelligent Systems, INISCOM 2021. 211-219. Hanoi. 22/04/2021
  4. Dac-Nhuong Le, Gia Nhu Nguyen, Trinh Ngoc Bao, Nguyen Ngoc Tuan, Huynh Quyet Thang, Suresh Chandra Satapathy. MMAS Algorithm and Nash Equilibrium to Solve Multi-Round Procurement Problem. Advances in Systems, Control and Automations. Lecture Notes in Electrical Engineering, vol 708.. 273-284. 05/03/2020
  5. Nguyen,V.-Q., Nguyen,V.-H., Nguyen, M.-Q.,Huynh, Q.-T, Kim K. Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries. Electronics. 1-16. 19/04/2021
  6. Nguyễn Thị Xuân Hoà, Huỳnh Quyết Thắng, Nguyễn Hương Giang, Lê Hiếu Học. Nâng cao hiệu quả chuỗi cung ứng thông qua quản lý tồn kho VMI. Tạp chí Công thương. 235-243. 19/01/2021
  7. Le TA., Huynh QT., Nguyen TTN., Thi MH.T.. A New Method for Enhancing Software Effort Estimation by Using ANFIS-Based Approach. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, Vol 379. 195-210. Hanoi. 22/04/2021
  8. Trần Đình Diễn, Huỳnh Quyết Thắng, Nguyễn Thành Đạt. Phát triển thuật toán sinh code cho ứng dụng web chuẩn đoán bệnh thủy sản với ATL. Tạp chí Nghiên cứu Khoa hoạc và Công nghệ quân sự. 102-111. 12/04/2021
  9. Hoang-Long Huynh, Huu-Duc Nguyen, Trong-Vinh Le, Quyet-Thang Huynh.. CAM-D: A Description Method for Multi-Cloud Marketplace Application. Research and Development on Information and Communication Technology, ICT Research. 51-61. 31/01/2021

Publications in 2020

  1. Bùi Thị Mai Anh. Enhanced Genetic Algorithm for Automatic Generation of Unit and Integration Test Suite. RIVF 2020. 1-6. Ho Chi Minh City, Vietnam. 09/07/2020
  2. Bui Thi Mai Anh, Nguyen Tra My, Nguyen Thi Thu Trang. Enhanced Genetic Algorithm for Single Document Extractive Summarization. SOICT 2019. 04/12/2019
  3. Duc-Man Nguyen, Quyet-Thang Huynh, Nhu-Hang Ha, and Thanh-Hung Nguyen. Automated Test Input Generation via Model Inference Based on User Story and Acceptance Criteria for Mobile Application Development. International Journal of Software Engineering and Knowledge Engineering. 399-425. 30/06/2020
  4. Thanh Tam Nguyen, Thanh Dat Hoang, Minh Tam Pham, Tuyet Trinh Vu,Thanh Hung Nguyen, Quyet-Thang Huynh, Jun Jo. Monitoring agriculture areas with satellite images and deep learning. Applied Soft Computing. 1-16. 23/07/2020
  5. Dac-Nhuong Le, Gia Nhu Nguyen, Harish Garg, Quyet-Thang Huynh, Trinh Ngoc Bao, Nguyen Ngoc Tuan. Optimizing Bidders Selection of Multi-Round Procurement Problem in Software Project Management Using Parallel Max-Min Ant System Algorithm. Journal of Computers, Materials & Continua. 993-1010. 25/07/2020
  6. Trinh Ngoc Bao, Quyet-Thang Huynh, Xuan-Thang Nguyen, Gia Nhu Nguyen, Dac-Nhuong Le. A Novel Particle Swarm Optimization Approach to Support Decision-Making in the Multi-Round of an Auction by Game Theory. International Journal of Computational Intelligence Systems. 1447 - 1463. 22/08/2020
  7. Quyet-Thang Huynh, Ngoc-Tuan Nguyen. Probabilistic Method for Managing Common Risks in Software Project Scheduling Based on Program Evaluation Review Technique. International Journal of Information Technology Project Management (IJITPM). 77-94. 15/07/2020
  8. Quyet-Thang Huynh, Thi-Huong-Giang Vu, Doan-Cuong Nguyen, Thanh-Trung Vu, Cong-Tue Hoang, Thi-Xuan-Hoa Nguyen. A Profit-Equilibrium Model for Retailers and Vendors in the Vendor Managed Inventory Problem. 2020 4th International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom). 30-34. 28/08/2020
  9. Quyet-Thang Huynh, Le-Trinh Pham, Nhu-Hang Ha and Duc-Man Nguyen. An Effective Approach for Context Driven Testing in Practice - A Case Study. International Journal of Software Engineering and Knowledge Engineering. 1245–1262. 14/06/2020
  10. Vũ Thành Trung, Nguyễn Doãn Cường, Huỳnh Quyết Thắng. Nghiên cứu kết hợp giải thuật TABU SEARCH, GA và Cân bằng NASH áp dụng giải bài toán tối ưu đa mục tiêu trong bài toán phân bổ nguồn lực cho dự án phần mềm. Tạp chí Nghiên cứu KH&CN quân sự, Số Đặc san Hội thảo Quốc gia FEE, 10 – 2020, ISSN 1859 – 1043, trang 463-477, ngày xuất bản online 5/10/2020. 463-477. 05/10/2020
  11. Hoang-Long Huynh, Huu-Duc Nguyen, Trong-Vinh Le, Quyet-Thang Huynh, Thi-Nhan Vu. An approach for auto-repairing cloud application on multi-cloud marketplace. Một số vấn đề chọn lọc của Công nghệ thông tin và Truyền thông. 17-22. 01/06/2019
  12. Van-Quyet Nguyen, Huu-Duy Nguyen, Quyet-Thang Huynh, Nalini Venkatasubramanian, Kyungbaek Kim. A Scalable Approach for Dynamic Evacuation Routing in Large Smart Buildings. IEEE International Conference on Smart Computing (SMARTCOMP), SMARTCOMP 2019. 01/06/2019
  13. Quyet-Thang Huynh, The-Anh Le, Thanh-Hung Nguyen, Nhat-Hai Nguyen, Duc-Hieu Nguyen. A Method for Improvement the Parameter Estimation of Non-linear Regression in Growth Model to Predict Project Cost at Completion. The 2020 RIVF International Conference on Computing & Communication Technologies (RIVF). 232-237. RMIT University, Vietnam. 21/07/2020
  14. The-Anh Le, Quyet-Thang Huynh, Thanh-Hung Nguyen, Nhat-Hai Nguyen, Phuong-Nam Cao. A Method for Conference Project Completion Cost Predicting Using LSTM in Earned Value Management Technique. 4th International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom), Hanoi, Vietnam, 2020. 87-92. 28/08/2020
  15. Nguyen Nhat Hai, Trung Duy Pham, Ha Thi Hong Nguyen. Right protection mechanism based on optimal robust watermarking for shared EEG data. The 2020 IEEE-RIVF International Conference on Computing and Communications Technologies. 322. Ho Chi Minh City, Vietnam. 05/04/2020