Knowledge Discovery

Overview: The Knowledge Discovery theme encompasses a broad range of distinct research areas that focus on the transformation of various data sources into sources of knowledge, e.g. predicting protein structure from gene sequencing data. In many cases, the modelling, learning, reasoning and visualisation techniques used in this transformation process are computational expensive. Hence, the manipulation of data not only requires the development of effective algorithms, but also efficient ones. CSI researchers are applying their ground-breaking knowledge discovery techniques to interesting real-world problems in the sciences, industry and finance. Here are list of sub-themes that are of particular interest to CSI researchers:

Bioinformatics research at CSI focuses on the application of machine learning and data mining techniques to the interpretation of raw genomic data. Since many of these datasets are massive and the processing of them is computationally expensive, researchers are also interested in algorithm design issues surrounding efficiency, and tera- and petabyte data management. Applications of Bioinformatics research at CSI include the development of data mining algorithms for analysing microarray data, the mining of protein-protein interactions, protein structure predication and the prediction of chemical compound properties to aid effective and non-toxic drug design.

Digital Forensics and Security research at CSI focuses on two broad issues: reducing cybercrime activity; and helping law enforcement officers build criminal cases when cybercrimes are committed. The term cybercrime refers to any criminal activity which is facilitated by a computer system or network, e.g. drug trafficking, money laundering, terrorism, fraudulent financial transactions, hacking etc. The aim of Computer Forensics research is to develop computational analysis techniques that can gather evidence of criminal intent for presentation in a court of law. In this area, CSI researchers are investigating the automatic reconstruction of computer-based incidents which capture the sequence of events leading to a security breach. Data mining techniques are also being developed which will provide intelligent analysis of a suspect's computer system, and guide the investigator in their investigation. In the area of Security, our researchers are investigating Digital Watermarking and Information Hiding techniques, as a way of reducing the incidence of copyright infringement.

Machine Learning research involves the development of algorithms, capable of automatically enhancing their effectiveness through experience. Many of applications of this work are evident in CSI research, such as Data Mining, where hidden patterns and relationships are extracted from data, and Collaborative Filtering where user preferences are learned, and information content is appropriately adapted. A number of other sub-themes in the Knowledge Discovery theme are in fact application areas of Machine Learning, where specific algorithmic solutions, and feature engineering is required to model problems in diverse domains such as molecular biology, forensic computing, natural language, and data on the Web.

Natural Computing research focuses on the study of a large family of algorithms inspired by Nature, including Biological, Social and Physical systems. Broadly speaking, these algorithms draw metaphorical inspiration from diverse sources, including the operation of biological neurons, processes of evolution, models of social interaction amongst organisms, and natural immune systems, in order to develop tools for solving real-world problems, such as financial modeling.


Intelligent Information Access research at CSI aims to improve a user’s access to information through supportive technologies such as search engines, recommendation systems, and collaborative filtering engines. For example, today’s search engine users are overwhelmed with the amount of information available – some of it relevant, most of it not. Researchers that develop new information-access technologies want to reduce this cognitive load on the user by focussing their attention on the most relevant material addressing their information need. CSI researchers are coming at this problem from this two different perspectives: information filteringwhich increase the relevance of the information returned to the user by learning, for example, user preferences over time, and personalising results based on these observations; and information presentation which displays to the user alternative, condensed views of relevant information (other than, for example a rank list of web pages) by utilising techniques such as document/image clustering, or multi-document summarisation. CSI researchers have applied these techniques to diverse application such as: geographic information systems, adaptive retail systems, a personalised TV recommendation system, and a social networking-based search engine. 

Information Visualisation researchers at CSI are developing novel representations for capturing relationships, patterns, trends, clusters and outliers, that are hidden from the human eye in massive datasets. Hence, visualisation tools are evaluated based on their ability to increase human problem solving ability by helping to accelerate human thinking. The development of such tools often involves the combination techniques in knowledge discovery such as data mining with information visualisation methodologies. Information Visualisation techniques have been applied to many diverse datasets in CSI such as genomic and geospatial data.