Introduction

The field of artificial intelligence is witnessing a recent explosion in research, tool development as well as software system deployment. Many software development companies are shifting their focus on developing intelligent software systems by adopting AI to their existing processes. At the same time, the academic research community is injecting AI paradigms to provide robust solutions to traditional software engineering (SE) problems and has achieved promising results. Again, AI has been proved useful to address major challenges that the SE field has been facing. Indeed, different AI paradigms (such as neural networks, machine learning, knowledge-based systems, natural language processing) can be applied to different SE phases (i.e., requirement, design, development, testing, release and maintenance) to improve the process and reduce human-effort in tedious and cumbersome tasks (e.g., testing, debugging etc.).

Our research group focuses on studying how to apply AI to resolve current challenges of SE phases during software development lifecycle. We aim to leverage AI paradigms in improving the existing processes by representing intelligence and eventually automating some SE phases such as requirement engineering, testing and debugging. We are also focusing on applying software methodology and architecture in developing AI applications. See the slides here for more detail.

Contact: Dr. Bui Thi Mai Anh, Email: anhbtm@soict.hust.edu.vn

Research Directions

  • Intelligent Requirement engineering and planning: Requirement and planning is the first stage in the SE process which allows to establish the building blocks of a software system. Software requirements describe the outlook of a software application by specifying the software’s main objectives and goals. Requirement Engineering is the engineering discipline of establishing user requirements and specifying software systems. Studies in this field aim to automatically extract requirement information from users (which is often described by natural languages) and to systematically represent software requirements (by using modeling languages or specification languages such as UML). Applying Natural Language Processing (NLP) and Machine Learning/Deep Learning are trending directions in this field. Aside from automated requirement engineering, to help project managers in effective planning and software cost estimation, the selection of software requirements and scheduling software development process are addressed by using multiobjective evolutionary algorithms (MOEAs).
  • Automated Software refactoring: Refactoring is one of the most expensive parts of software implementation. Code refactoring is intended to improve the design, structure and/or implementation of software systems while preserving their functionality. Search-based methods (using metaheuristics) and machine learning approaches are combined to automatically suggest refactoring solutions.
  • Search Based Software Testing: Software testing is one of the major targets of AI-driven methods. The testing process is a process of executing a program or application with the intent of finding software defects and bugs by using a given test suite (set of test cases). Research directions in this field aim to address the problem of how to design effective test cases to detect as maximum as possible software defects and bugs. This problem is formulated as an optimization problem: minimizing generated test cases while maximizing the detection of bugs/faults. We aim to apply search-based optimization algorithms such as EAs, GA, PSO, etc. to tackle this problem.
  • Software Defect Prediction: Software maintenance is defined by the IEEE as the modification of a software product after delivery to correct faults, to improve performance or to adapt the product to a modified environment. Mining user feedback is essential for software engineers. We aim to apply ML techniques and NLP to explore user feedback in order to automatically predict software defects, including bug localization and fault localization, and to automatically repair them.
  • Automated Program Repair using Machine Learning techniques: Along with the development of computer science, software applications are more and more popular in every domain in the real world. The growth makes the reliability of software programs becoming critical since issue-impacted programs may jeopardize digital assets. Unfortunately, the reliability of software could always be negatively affected by numerous problems, especially software bugs, which are pervasive in software systems. To reduce the tremendous damages caused by software bugs, developers should fix them in time. However, manual bug-fixing is notoriously tricky, tedious, and time-consuming. This project aims to create an automated program repair system based on machine learning techniques to reduce manual bug-fixings efforts. We first investigate the performance of learning-based automated program repair techniques on standard benchmarks such as Defects4J, Codeflaws, etc. Further, we analyze the results to identify the main challenges facing learning-based techniques. Finally, we proposed novel learning-based program repair techniques to address the challenges.
  • Vulnerabilities Detection in Machine Learning Frameworks: Smart systems are increasingly dependent on machine learning (ML) frameworks, e.g., TensorFlow, for their feature implementation. These frameworks are built on top of many third-party libraries, which depend on many others. Simply trusting and reusing a framework poses a security risk as the framework and its direct and transitive dependencies can contain exploitable vulnerabilities. To mitigate this risk, this project will create an advanced software composition analysis solution that scans dependency hierarchies and builds new deep learning architectures to analyse code and document repository data and flag vulnerabilities. Further, the flagged vulnerabilities will be verified if it can be reached via our new directed grammar-based fuzzing solution that generates valid test cases (following predefined grammars) and drives test executions to vulnerable code. Our solution targets vulnerabilities hidden deep in ML framework dependencies, which are hard for a classic fuzzer to uncover and for framework developers to recognize as they appear in third-party code.
  • Development of optimization resources solutions using Nash Equivalium and multi-objective algorithms for some applications: We will develop a multi-objective optimization model that uses the Unified Model to model game theory conflicts and resource allocation based on Nash equilibrium. We will test optimization algorithms, we use methods like: Meta-heuristic, Evolution Algorithm, Genetic Algorithm, Particle Swarm Optimization, and Ant Colony Optimization. Real-time data for problems such as resource management in software projects or freight optimization in logistics management.
  • Development of AI application:
    • Stock price prediction: Building machine learning, deep learning (convolutional neural network, natural language processing, long short-term memory), reinforcement learning to predict the direction of the stock price in the next day based on historical data.

Research Problems

  • Test case generation: Addressing the generation of effective test cases to cover as maximum as possible testing criteria by using evolutionary algorithms such as GA, PSO, GWO, etc. Combining optimization algorithm with constraint-based local search to more effectively generate test suites.
  • Requirement Mining: Using NLP techniques to examine user requirements and transform from natural language to requirement models (usecase model, activity model) then applying evolutionary algorithms to generate test cases from requirement models (from the perspective of blackbox testing).
  • Requirements Selection: Prioritizing software requirements is an optimization task. The selection of software requirements is based on the project deadline as well as on the financial budget. This motivates us to study multi-objective metaheuristic algorithms to select effectively requirements during software development process.
  • Automated Software Refactoring: Code smells are certain structures in the code that indicate the violation of fundamental design principles and negatively impact design quality. We aim to apply machine learning and deep learning techniques to detect code smells and suggest refactoring solutions by using evolutionary algorithms.
  • Bug Localization: During the development of software products, bug tracking systems (such as Bugzilla and JIRA) are used to report and manage bugs. Bug localization refers to the task of locating the potential buggy source code files in a software project given a bug report. The major challenges of bug localization come from the lexical mismatches between natural languages which are used to describe bug reports and programming languages which are used to write source files. We aim to apply deep learning models to narrow the lexical gap as well as to improve the performance of existing bug localization models. We are also focusing on addressing the imbalanced problem in this field as the number of buggy files takes a small part of source code files for a given bug report. The common approaches are based on data-driven (e.g., oversampling, undersampling) and model-driven techniques (bootstrapping, reinforcement learning, cost-sensitive learning, ensemble learning).
  • Fault Localization: While bug localization focuses more on locating buggy source files, in the fault localization problem, given the execution of test cases, an FL tool identifies the set of suspicious lines of code with their suspiciousness scores. We apply machine learning and deep learning techniques to analyze source code and detect faulty statements at the method level. We are also focusing on exploiting method features through code complexity metrics, mutation-based formula, etc. (often lead to +200 features) to evaluate the sensitivity of features as well as to examine the constraint between features to augment the performance of FL models.
  • Stock Price Prediction Problems:
    • Insufficient data
      • Using new technologies such as transfer learning, ensemble machine learning models, data augmentation, and synthetic data
    • Imbalanced data
      • Resampling training samples such as over-sampling with minority class, under-sampling with majority class, or combine two methods
    • Feature engineering
      • Finding important information from given data
    • Finding new feature to help imporve the training model performance

Team Members

Dr. Bui Thi Mai Anh
Team Leader

Assoc. Prof. Huynh Quyet Thang
Member

Dr. Nguyen Nhat Hai
Member

Dr. Tran Nhat Hoa
Member

Latest publications