Introduction

The field of artificial intelligence is witnessing a recent explosion in research, tool development as well as software system deployment. Many software development companies are shifting their focus on developing intelligent software systems by adopting AI to their existing processes. At the same time, the academic research community is injecting AI paradigms to provide robust solutions to traditional software engineering (SE) problems and has achieved promising results. Again, AI has been proved useful to address major challenges that the SE field has been facing. Indeed, different AI paradigms (such as neural networks, machine learning, knowledge-based systems, natural language processing) can be applied to different SE phases (i.e., requirement, design, development, testing, release and maintenance) to improve the process and reduce human-effort in tedious and cumbersome tasks (e.g., testing, debugging etc.).

Our research group focuses on studying how to apply AI to resolve current challenges of SE phases during software development lifecycle. We aim to leverage AI paradigms in improving the existing processes by representing intelligence and eventually automating some SE phases such as requirement engineering, testing and debugging. We are also focusing on applying software methodology and architecture in developing AI applications. See the slides here for more detail.

Contact: Dr. Bui Thi Mai Anh, Email: anhbtm@soict.hust.edu.vn

Research Directions

  • Intelligent Requirement engineering and planning: Requirement and planning is the first stage in the SE process which allows to establish the building blocks of a software system. Software requirements describe the outlook of a software application by specifying the software’s main objectives and goals. Requirement Engineering is the engineering discipline of establishing user requirements and specifying software systems. Studies in this field aim to automatically extract requirement information from users (which is often described by natural languages) and to systematically represent software requirements (by using modeling languages or specification languages such as UML). Applying Natural Language Processing (NLP) and Machine Learning/Deep Learning are trending directions in this field. Aside from automated requirement engineering, to help project managers in effective planning and software cost estimation, the selection of software requirements and scheduling software development process are addressed by using multiobjective evolutionary algorithms (MOEAs).
  • Automated Software refactoring: Refactoring is one of the most expensive parts of software implementation. Code refactoring is intended to improve the design, structure and/or implementation of software systems while preserving their functionality. Search-based methods (using metaheuristics) and machine learning approaches are combined to automatically suggest refactoring solutions.
  • Search Based Software Testing: Software testing is one of the major targets of AI-driven methods. The testing process is a process of executing a program or application with the intent of finding software defects and bugs by using a given test suite (set of test cases). Research directions in this field aim to address the problem of how to design effective test cases to detect as maximum as possible software defects and bugs. This problem is formulated as an optimization problem: minimizing generated test cases while maximizing the detection of bugs/faults. We aim to apply search-based optimization algorithms such as EAs, GA, PSO, etc. to tackle this problem.
  • Software Defect Prediction: Software maintenance is defined by the IEEE as the modification of a software product after delivery to correct faults, to improve performance or to adapt the product to a modified environment. Mining user feedback is essential for software engineers. We aim to apply ML techniques and NLP to explore user feedback in order to automatically predict software defects, including bug localization and fault localization, and to automatically repair them.
  • Automated Program Repair using Machine Learning techniques: Along with the development of computer science, software applications are more and more popular in every domain in the real world. The growth makes the reliability of software programs becoming critical since issue-impacted programs may jeopardize digital assets. Unfortunately, the reliability of software could always be negatively affected by numerous problems, especially software bugs, which are pervasive in software systems. To reduce the tremendous damages caused by software bugs, developers should fix them in time. However, manual bug-fixing is notoriously tricky, tedious, and time-consuming. This project aims to create an automated program repair system based on machine learning techniques to reduce manual bug-fixings efforts. We first investigate the performance of learning-based automated program repair techniques on standard benchmarks such as Defects4J, Codeflaws, etc. Further, we analyze the results to identify the main challenges facing learning-based techniques. Finally, we proposed novel learning-based program repair techniques to address the challenges.
  • Vulnerabilities Detection in Machine Learning Frameworks: Smart systems are increasingly dependent on machine learning (ML) frameworks, e.g., TensorFlow, for their feature implementation. These frameworks are built on top of many third-party libraries, which depend on many others. Simply trusting and reusing a framework poses a security risk as the framework and its direct and transitive dependencies can contain exploitable vulnerabilities. To mitigate this risk, this project will create an advanced software composition analysis solution that scans dependency hierarchies and builds new deep learning architectures to analyse code and document repository data and flag vulnerabilities. Further, the flagged vulnerabilities will be verified if it can be reached via our new directed grammar-based fuzzing solution that generates valid test cases (following predefined grammars) and drives test executions to vulnerable code. Our solution targets vulnerabilities hidden deep in ML framework dependencies, which are hard for a classic fuzzer to uncover and for framework developers to recognize as they appear in third-party code.
  • Development of optimization resources solutions using Nash Equivalium and multi-objective algorithms for some applications: We will develop a multi-objective optimization model that uses the Unified Model to model game theory conflicts and resource allocation based on Nash equilibrium. We will test optimization algorithms, we use methods like: Meta-heuristic, Evolution Algorithm, Genetic Algorithm, Particle Swarm Optimization, and Ant Colony Optimization. Real-time data for problems such as resource management in software projects or freight optimization in logistics management.
  • Development of AI application:
    • Stock price prediction: Building machine learning, deep learning (convolutional neural network, natural language processing, long short-term memory), reinforcement learning to predict the direction of the stock price in the next day based on historical data.

Research Problems

  • Test case generation: Addressing the generation of effective test cases to cover as maximum as possible testing criteria by using evolutionary algorithms such as GA, PSO, GWO, etc. Combining optimization algorithm with constraint-based local search to more effectively generate test suites.
  • Requirement Mining: Using NLP techniques to examine user requirements and transform from natural language to requirement models (usecase model, activity model) then applying evolutionary algorithms to generate test cases from requirement models (from the perspective of blackbox testing).
  • Requirements Selection: Prioritizing software requirements is an optimization task. The selection of software requirements is based on the project deadline as well as on the financial budget. This motivates us to study multi-objective metaheuristic algorithms to select effectively requirements during software development process.
  • Automated Software Refactoring: Code smells are certain structures in the code that indicate the violation of fundamental design principles and negatively impact design quality. We aim to apply machine learning and deep learning techniques to detect code smells and suggest refactoring solutions by using evolutionary algorithms.
  • Bug Localization: During the development of software products, bug tracking systems (such as Bugzilla and JIRA) are used to report and manage bugs. Bug localization refers to the task of locating the potential buggy source code files in a software project given a bug report. The major challenges of bug localization come from the lexical mismatches between natural languages which are used to describe bug reports and programming languages which are used to write source files. We aim to apply deep learning models to narrow the lexical gap as well as to improve the performance of existing bug localization models. We are also focusing on addressing the imbalanced problem in this field as the number of buggy files takes a small part of source code files for a given bug report. The common approaches are based on data-driven (e.g., oversampling, undersampling) and model-driven techniques (bootstrapping, reinforcement learning, cost-sensitive learning, ensemble learning).
  • Fault Localization: While bug localization focuses more on locating buggy source files, in the fault localization problem, given the execution of test cases, an FL tool identifies the set of suspicious lines of code with their suspiciousness scores. We apply machine learning and deep learning techniques to analyze source code and detect faulty statements at the method level. We are also focusing on exploiting method features through code complexity metrics, mutation-based formula, etc. (often lead to +200 features) to evaluate the sensitivity of features as well as to examine the constraint between features to augment the performance of FL models.
  • Stock Price Prediction Problems:
    • Insufficient data
      • Using new technologies such as transfer learning, ensemble machine learning models, data augmentation, and synthetic data
    • Imbalanced data
      • Resampling training samples such as over-sampling with minority class, under-sampling with majority class, or combine two methods
    • Feature engineering
      • Finding important information from given data
    • Finding new feature to help imporve the training model performance

Team Members

Dr. Bui Thi Mai Anh
Team Leader

Assoc. Prof. Huynh Quyet Thang
Member

Dr. Nguyen Nhat Hai
Member

Dr. Tran Nhat Hoa
Member

Latest publications

Publications in 2023

  1. Bùi Thị Mai Anh, Dương Việt Anh, Bùi Quốc Trung. A Filter Approach Based on Binary Integer Programming for Feature Selection. RIVF 2022. 677-682. Ho Chi Minh City. 20/12/2022

Publications in 2022

  1. Trần Hoàng Hải, Nguyễn Thanh Hùng, Nguyễn Nhất Hải, Đặng Tuấn Linh, Huỳnh Quyết Thắng. eHUST - Một mô hình mẫu cho hệ thống quản trị Nhà trường hỗ trợ Chuyển đổi số tại Việt Nam. Thúc đấy Chuyển đổi số, Kinh tế tuần hoàn và kinh tế xanh - Hướng tới mục tiêu phát triển bền vững. 18-26. Trường Đại học Phenikaa. 12/11/2022
  2. Nhat-Hoa Tran. Model Checking Techniques Enable Schedulability Analysis of Real-Time Systems. SoICT 2022: The 11th International Symposium on Information and Communication Technology. 336–343. Hanoi. 01/12/2022
  3. Hung‐Cuong Nguyen, Quyet‐Thang Huynh. New non‐homogeneous Poisson process software reliability model based on a 3‐parameter S‐shaped function. IET Software, Volume 16, Issue 2, 2022, pp. 214-232, Q2, IF=1.150. 214-232. 11/01/2022
  4. Bui Thi Mai Anh, Nguyen Thi Thu Trang, Tran Thi Dinh. A Novel Type-based Genetic Algorithm for Extractive Summarization. Thirty-Fifth International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems. 143-155. 19/07/2022
  5. A Yvan Guifo Fodjo, Mikal Ziane, Serge Stinckwich, Bui Thi Mai Anh, Samuel Bowong. Separation of Concerns in Extended Epidemiological Compartmental Models. Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies. 152-159. Vienna, Austria. 09/02/2022
  6. 1. Test. Test. 06/12/2022
  7. Nhat-Hoa Tran, Toshiaki Aoki. SSpinJa: Facilitating Schedulers in Model Checking. 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). 632-641. Hainan, China. 06/12/2021
  8. Ho Anh, Nguyễn Nhất Hai, Bùi Thị Mai Anh. Combining Deep Learning and Kernel PCA for Software Defect Prediction. SOICT 2022. 360-367. Ha Noi - Quang Ninh. 01/12/2022
  9. Bùi Quốc Trung, Trần Văn Trí, Bùi Thị Mai Anh. Empirical Analysis of Filter Feature Selection Criteria on Financial Datasets. SOICT 2022. 413-419. Hanoi - Quang Ninh. 01/12/2022
  10. Bui Thi Mai Anh, Nguyen Viet Luyen. An Imbalanced Deep Learning Model for Bug Localization. 28th Asia-Pacific Software Engineering Conference. 32-40. Taiwan. 06/12/2021
  11. Thanh-Dat Nguyen, Thanh Le-Cong, ThanhVu H. Nguyen, Xuan-Bach D. Le, Quyet-Thang Huynh. Toward the Analysis of Graph Neural Networks. IEEE/ACM 44th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER),. 116-120. 22/05/2022
  12. Bui Quoc Trung, Le Minh Duc, Bui Thi Mai Anh. A Hybrid Approach based on Genetic Algorithm with Ranking Aggregation for Feature Selection. Thirty-Fifth International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems. 127-140. Kitakyushu, Japan. 19/07/2022

Publications in 2021

  1. The-Anh Le, Quyet-Thang Huynh, Thanh-Hung Nguyen. A New Method to Improve Quality Predicting of Software Project Completion Level. Industrial Networks and Intelligent Systems, INISCOM 2021. 211-219. Hanoi. 22/04/2021
  2. Dac-Nhuong Le, Gia Nhu Nguyen, Trinh Ngoc Bao, Nguyen Ngoc Tuan, Huynh Quyet Thang, Suresh Chandra Satapathy. MMAS Algorithm and Nash Equilibrium to Solve Multi-Round Procurement Problem. Advances in Systems, Control and Automations. Lecture Notes in Electrical Engineering, vol 708.. 273-284. 05/03/2020
  3. Nguyễn Thị Xuân Hoà, Huỳnh Quyết Thắng, Nguyễn Hương Giang, Lê Hiếu Học. Nâng cao hiệu quả chuỗi cung ứng thông qua quản lý tồn kho VMI. Tạp chí Công thương. 235-243. 19/01/2021
  4. Bùi Thị Mai Anh, Nguyễn Thị Thu Trang. A Feature-Augmented Deep Learning Model for Extractive Summarization. INISCOM 2021. Vol 379. Le Quy Don University, Hanoi, Vietnam. 22/04/2021
  5. Hoang-Long Huynh, Huu-Duc Nguyen, Trong-Vinh Le, Quyet-Thang Huynh. CAM-D: A Description Method for Multi-Cloud Marketplace Application. Research and Development on Information and Communication Technology, ICT Research. 51-61. 31/01/2021
  6. Bùi Thị Mai Anh, Nguyễn Nhất Hải. Adaptive Ranking Relevant Source Files for Bug Reports Using Genetic Algorithm. SOMET 2021. 430-443. Cacun, Mexico. 21/09/2021
  7. Hieu T Nguyen, Hieu H Pham, Nghia T Nguyen, Ha Q Nguyen, Thang Q Huynh, Minh Dao, Van Vu. VinDr-SpineXR: A deep learning framework for spinal lesions detection and classification from radiographs. International Conference on Medical Image Computing and Computer-Assisted Intervention MICCAI 2021: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. 291–301. Strasbourg, France. 27/09/2021
  8. Thanh Le-Cong, Xuan Bach Le D., Phi Le Nguyen, Quyet Thang Huynh. Usability and Aesthetics: Better Together for Automated Repair of Web Pages. 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). 1-6. 25/10/2021
  9. Trần Đình Diễn, Huỳnh Quyết Thắng, Nguyễn Thành Đạt. Phát triển thuật toán sinh code cho ứng dụng web chuẩn đoán bệnh thủy sản với ATL. Tạp chí Nghiên cứu Khoa hoạc và Công nghệ quân sự. 102-111. 12/04/2021
  10. Nguyen,V.-Q., Nguyen,V.-H., Nguyen, M.-Q.,Huynh, Q.-T, Kim K. Efficiently Estimating Joining Cost of Subqueries in Regular Path Queries. Electronics. 1-16. 19/04/2021
  11. Anh Son TA. Sovling problem. NICS. 17/01/2021
  12. Thi Thu Trang Nguyen, Bui Thi-Mai-Anh, Tran Thi Dinh, Nguyen Thi Hoai. A Hybrid PSO-GA for Extractive Text Summarization. PACLIC 2021. 757-766. Shanghai, China. 04/11/2021
  13. Le TA., Huynh QT., Nguyen TTN., Thi MH.T.. A New Method for Enhancing Software Effort Estimation by Using ANFIS-Based Approach. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, Vol 379. 195-210. Hanoi. 22/04/2021