Jano's Homepage
Personal
Publications
 · Constraint Satisfaction
 · Data Mining
 · Dynamic Problems
 · e-Science
 · Evolutionary Art
 · Problem Evolving  
Presentations
Projects
Photos
Past

Jano van Hemert


 
Publications


Federated Enactment of Workflow Patterns
inproceedings Gagarine Yaikhom and Liew, Chee Sun and Liangxiu Han and van Hemert, Jano and Malcolm Atkinson and Amy Krause @ 2010/07/12
Euro-Par, pages 317-328.

In this paper we address two research questions concerning workflows: 1) how do we abstract and catalogue recurring workflow patterns?; and 2) how do we facilitate optimisation of the mapping from workflow patterns to actual resources at runtime? Similar questions have been previously investigated in the context of optimising compilers: our aim here is to explore techniques that are applicable to large-scale workflow compositions, where the resources could change dynamically during the life-time of an application. We achieve this by introducing a mechanism for pattern abstraction and cataloguing, supported by runtime semantic bindings that are conditional to the execution parameters. Using a data mining application from the life sciences, we demonstrate this new approach experimentally.



Understanding TSP Dfficulty by Learning from Evolved Instances
inproceedings Smith-Miles, K. A. and van Hemert, J.I. and Lim, Y. @ 2010/04/25
Proceedings of Learning and Intelligent Optimization, pages In press.

Whether the goal is performance prediction, or insights into the relationships between algorithm performance and instance characteristics, a comprehensive set of meta-data from which relationships can be learned is needed. This paper provides a methodology to determine if the meta-data is sufficient, and demonstrates the critical role played by instance generation methods. Instances of the Travelling Salesman Problem (TSP) are evolved using an evolutionary algorithm to produce distinct classes of instances that are intentionally easy or hard for certain algorithms. A comprehensive set of features is used to characterise instances of the TSP, and the impact of these features on difficulty for each algorithm is analysed. Finally, performance predictions are achieved with high accuracy on unseen instances for predicting search effort as well as identifying the algorithm likely to perform best.



Correcting for intra-experiment variation in Illumina BeadChip data is necessary to generate robust gene-expression profiles
article Kitchen, Robert and Sabine, Vicky and Sims, Andrew and Macaskill, E Jane and Renshaw, Lorna and Thomas, Jeremy and van Hemert, Jano and Dixon, J Michael and Bartlett, John @ 2010/03/25
BMC Genomics, 11(1), 2010, pages 134.
[ url ]

BACKGROUND:Microarray technology is a popular means of producing whole genome transcriptional profiles, however high cost and scarcity of mRNA has led many studies to be conducted based on the analysis of single samples. We exploit the design of the Illumina platform, specifically multiple arrays on each chip, to evaluate intra-experiment technical variation using repeated hybridisations of universal human reference RNA (UHRR) and duplicate hybridisations of primary breast tumour samples from a clinical study.RESULTS:A clear batch-specific bias was detected in the measured expressions of both the UHRR and clinical samples. This bias was found to persist following standard microarray normalisation techniques. However, when mean-centering or empirical Bayes batch-correction methods (ComBat) were applied to the data, inter-batch variation in the UHRR and clinical samples were greatly reduced. Correlation between replicate UHRR samples improved by two orders of magnitude following batch-correction using ComBat (ranging from 0.9833-0.9991 to 0.9997-0.9999) and increased the consistency of the gene-lists from the duplicate clinical samples, from 11.6% in quantile normalised data to 66.4% in batch-corrected data. The use of UHRR as an inter-batch calibrator provided a small additional benefit when used in conjunction with ComBat, further increasing the agreement between the two gene-lists, up to 74.1%.CONCLUSION:In the interests of practicalities and cost, these results suggest that single samples can generate reliable data, but only after careful compensation for technical bias in the experiment. We recommend that investigators appreciate the propensity for such variation in the design stages of a microarray experiment and that the use of suitable correction methods become routine during the statistical analysis of the data.



Molecular Orbital Calculations of Inorganic Compounds
incollection Morrison, C.A. and Robertson, N. and Turner, A. and van Hemert, J. and Koetsier, J. @ 2010/03/25
Inorganic Experiments, pages 261-267.


An Open Source Toolkit for Medical Imaging De-Identification
article Rodr\'\iguez González, D. and T. Carpenter and van Hemert, J.I. and J. Wardlaw @ 2010/01/05
European Radiology, 20(8), 2010, pages 1896-1904.
[ url ]

Objective Medical imaging acquired for clinical purposes can have several legitimate secondary uses in research projects and teaching libraries. No commonly accepted solution for anonymising these images exists because the amount of personal data that should be preserved varies case by case. Our objective is to provide a flexible mechanism for anonymising Digital Imaging and Communications in Medicine (DICOM) data that meets the requirements for deployment in multicentre trials. Methods We reviewed our current de-identification practices and defined the relevant use cases to extract the requirements for the de-identification process. We then used these requirements in the design and implementation of the toolkit. Finally, we tested the toolkit taking as a reference those requirements, including a multicentre deployment. Results The toolkit successfully anonymised DICOM data from various sources. Furthermore, it was shown that it could forward anonymous data to remote destinations, remove burned-in annotations, and add tracking information to the header. The toolkit also implements the DICOM standard confidentiality mechanism. Conclusion A DICOM de-identification toolkit that facilitates the enforcement of privacy policies was developed. It is highly extensible, provides the necessary flexibility to account for different de-identification requirements and has a low adoption barrier for new users.



Rapid chemistry portals through engaging researchers
inproceedings Koetsier, J. and Turner, A. and Richardson, P. and van Hemert, J.I. @ 2009/12/08
Fifth IEEE International Conference on e-Science, pages 284-291.

In this study, we apply a methodology for rapid development of portlets for scientific computing to the domain of computational chemistry. We report results in terms of the portals delivered, the changes made to our methodology and the experience gained in terms of interaction with domain-specialists. Our major contributions are: several web portals for teaching and research in computational chemistry; a successful transition to having our development tool used by the domain specialist as opposed by us, the developers; and an updated version of our methodology and technology for rapid development of portlets for computational science, which is free for anyone to pick up and use.



Using the DCC Lifecycle Model to Curate a Gene Expression Database: A Case Study
article J. O'Donoghue and van Hemert, J.I. @ 2009/12/01
International Journal of Digital Curation, 4(3), 2009.

Developmental Gene Expression Map (DGEMap) is an EU-funded Design Study, which will accelerate an integrated European approach to gene expression in early human development. As part of this design study, we have had to address the challenges and issues raised by the long-term curation of such a resource. As this project is primarily one of data creators, learning about curation, we have been looking at some of the models and tools that are already available in the digital curation field in order to inform our thinking on how we should proceed with curating DGEMap. This has led us to uncover a wide range of resources for data creators and curators alike. Here we will discuss the future curation of DGEMap as a case study. We believe our experience could be instructive to other projects looking to improve the curation and management of their data



A model of social collaboration in Molecular Biology knowledge bases
inproceedings De Ferrari, L. and Aitken, S. and van Hemert, J.I. and Goryanin, I. @ 2009/10/01
Proceedings of the 6th Conference of the European Social Simulation Association (ESSA'09), pages In press.

Manual annotation of biological data cannot keep up with data production. Open annotation models using wikis have been proposed to address this problem. In this empirical study we analyse 36 years of knowledge collection by 738 authors in two Molecular Biology wikis (EcoliWiki and WikiPathways) and two knowledge bases (OMIM and Reactome). We first investigate authorship metrics (authors per entry and edits per author) which are power-law distributed in Wikipedia and we find they are heavy-tailed in these four systems too. We also find surprising similarities between the open (editing open to everyone) and the closed systems (expert curators only). Secondly, to discriminate between driving forces in the measured distributions, we simulate the curation process and find that knowledge overlap among authors can drive the number of authors per entry, while the time the users spend on the knowledge base can drive the number of contributions per author.



Giving Computational Science a Friendly Face
article van Hemert, J.I. and Koetsier, J. @ 2009/10/01
Zero-In, 1(3), 2009, pages 12-13.
[ url ]

Anyone who is purchasing a flight using a web browser expects to be guided through this task: from choosing the possible routes, to finding suitable dates and times, and to paying with a credit card. Today, most researchers from any discipline will successfully use these web-based e-commerce systems to book flights to attend their conferences. When these researchers are then confronted with solving compute-intensive problems, they need not expect such elaborate web-based systems to enable their domain-specific tasks. Instead, they will have to deal with archaic command-line tools and in the best case they may have access to generic portals that mimic the technical complexity of the underlying infrastructure. These interfaces are expensive to use as they require much investment from the researchers in terms of training. Moreover, the laborious and intricate processes involved often lead to errors.



Rapid development of computational science portals
inproceedings Koetsier, J. and van Hemert, J.I. @ 2009/09/09, Edinburgh
Proceedings of the IWPLS09 International Workshop on Portals for Life Sciences.
[ url ]

Motivation: Scientific web portals are seen as the way forward to improve upon the slow uptake in use of utility computing infrastructure and high-performance computing facilities. Currently, two types of portals exist: general-purpose portals and domain-specific portals. The first type closely resembles the underlying technical infrastructure of compute-job submission systems, thereby providing little appeal to a wide range of domain specialists. The second type is tailored to the application specifications and their end-users' requirements. Unfortunately, the technical complexity in domain-specific portals makes these expensive and time-consuming to develop and maintain. Clearly, an alternative to these two approaches is required. Results: We introduce an approach, Rapid, that facilitates rapid development of portlets. Its main aim is to reduce the time from development to the deployment from several months to a few weeks. Moreover, it facilitates an easy way to share and maintain these portlets by domain specialist themselves. Both these advantages considerably reduce the cost of developing portal solutions for computational science applications. We highlight several scientific domains where our approach is used or was used successfully. Availability: Rapid is developed under an Open Source model and is available freely through a Gnu General Public license. Main releases, documentation, tutorials and examples are available at http://research.nesc.ac.uk/rapid/. The development of Rapid uses an open read-only CVS repository, which is complemented by a developer community site at http://forge.nesc.ac.uk/projects/jos/.



Portals for Life Sciences-a Brief Introduction
inproceedings Gesing, S. and Kohlbacher, O. and van Hemert, J.I. @ 2009/09/09
Proceedings of the IWPLS09 International Workshop on Portals for Life Sciences.
[ url ]

The topic ''`Portals for Life Sciences''' includes various research fields, on the one hand many different topics out of life sciences, e.g. mass spectrometry, on the other hand portal technologies and diffe- rent aspects of computer science, such as usability of user interfaces and security of systems. The main aspect about portals is to simplify the user's interaction with computational resources which are concer- ted to a supported application domain.



Proceedings of the IWPLS09 International Workshop on Portals for Life Sciences
proceedings Gesing, S. and van Hemert, J.I. @ 2009/09/09, Edinburgh, UK
Proceedings of the International Workshop on Portals for Life Sciences.
[ url ]

IWPLS'09 focuses on research contributions for portals and tools in the field of life sciences. It brings together scientists from the fields of life science, bioinformatics and computer science. Its aim is to become the international platform to exchange experience, formulate ideas, and catch up on technological advances in molecular and systems biology in the context of portals. All papers published in these proceedings were accepted through a peer-reviewing process. Each paper had a 30-minute presentation and each accepted abstract had a lightning talk of 10 minutes. We would like to thank the authors for their contributions and our Program Committee for the effort put into reviewing. Nine papers were selected out of the excellent submissions. We owe much gratitude to the local organisers, for without their hard work the workshop would not have been such a success. We acknowledge both the e-Science Institute in Edinburgh and the Scottish Bioinformatics Forum for their financial contributions.



Using architectural simulation models to aid the design of data intensive application
inproceedings Fernández, J. and Han, L. and Nu\~nez, A. and Carretero, J. and van Hemert, J.I. @ 2009/09/01
Third International Conference on Advanced Engineering Computing and Applications in Sciences, pages 163-168.

Performance is an open issue in data intensive applications. Finding the best implementation and influential performance factors of hardware and software platforms for the data intensive applications requires trial and error. However, it is very difficult and costly to perform these trials in a real large-scale environment. In this paper we use a generic simulation framework SIMCAN to simulate hardware and software platforms of data intensive applications for investigating the influential performance factors, and thereby making decisions on the design of data intensive application architectures. We have employed a typical use case of a data mining application, in which the architecture has been proposed using a pipeline model. We have simulated various scenarios to investigate factors that affect the system performance to assist the architecture design and the simulation results provide useful information for this decision- making.



A distributed architecture for data mining and integration
inproceedings Atkinson, M.P. and van Hemert, J.I. and Han, L. and Hume, A. and Liew, C.S. @ 2009/06/07, New York, NY, USA
DADC '09: Proceedings of the second international workshop on Data-aware distributed computing, pages 11-20.

This paper presents the rationale for a new architecture to support a significant increase in the scale of data integration and data mining. It proposes the composition into one framework of (1) data mining and (2) data access and integration. We name the combined activity DMI. It supports enactment of DMI processes across heterogeneous and distributed data resources and data mining services. It posits that a useful division can be made between the facilities established to support the definition of DMI processes and the computational infrastructure provided to enact DMI processes. Communication between those two divisions is restricted to requests submitted to gateway services in a canonical DMI language. Larger-scale processes are enabled by incremental refinement of DMI-process definitions often by recomposition of lower-level definitions. Autonomous evolution of data resources and services is supported by types and descriptions which will support detection of inconsistencies and semi-automatic insertion of adaptations. These architectural ideas are being evaluated in a feasibility study that involves an application scenario and representatives of the community.



An E-infrastructure to Support Collaborative Embryo Research
inproceedings A. Barker and van Hemert, J.I. and R.A. Baldock and M.P. Atkinson @ 2009/05/22
Cluster Computing and the Grid, pages 520-525.

Within the context of the EU Design Study Developmental Gene Expression Map, we identify a set of challenges when facilitating collaborative research on early human embryo development. These challenges bring forth requirements, for which we have identified solutions and technology. We summarise our solutions and demonstrate how they integrate to form an e-infrastructure to support collaborative research in this area of developmental biology.



The Circulate Architecture: Avoiding Workflow Bottlenecks Caused By Centralised Orchestration
article Barker, A. and Weissman, J. and van Hemert, J.I. @ 2009/03/01
Cluster Computing, 12(2), 2009, pages 221-235.
[ url ]

As the number of services and the size of data involved in workflows increases, centralised orchestration techniques are reaching the limits of scalability. In the classic orchestration model, all data passes through a centralised engine, which results in unnecessary data transfer, wasted bandwidth and the engine to become a bottleneck to the execution of a workflow. This paper presents and evaluates the Circulate architecture which maintains the robustness and simplicity of centralised orchestration, but facilitates choreography by allowing services to exchange data directly with one another. Circulate could be realised within any existing workflow framework, in this paper, we focus on WS-Circulate, a Web services based implementation. Taking inspiration from the Montage workflow, a number of common workflow patterns (sequence, fan-in and fanout), input to output data size relationships and network configurations are identified and evaluated. The performance analysis concludes that a substantial reduction in communication overhead results in a 2-4 fold performance benefit across all patterns. An end-to-end pattern through the Montage workflow results in an 8 fold performance benefit and demonstrates how the advantage of using the Circulate architecture increases as the complexity of a workflow grows.



Towards a Virtual Fly Brain
article Armstrong, J.D. and van Hemert, J.I. @ 2009/03/01
Philosophical Transactions A, 367(1896), 2009, pages 2387-2397.
[ pdf | url ]

Models of the brain that simulate sensory input, behavioural output and information processing in a biologically plausible manner pose significant challenges to both Computer Science and Biology. Here we investigated strategies that could be used to create a model of the insect brain, specifically that of Drosophila melanogaster which is very widely used in laboratory research. The scale of the problem is an order of magnitude above the most complex of the current simulation projects and it is further constrained by the relative sparsity of available electrophysiological recordings from the fly nervous system. However, fly brain research at the anatomical and behavioural level offers some interesting opportunities that could be exploited to create a functional simulation. We propose to exploit these strengths of Drosophila CNS research to focus on a functional model that maps biologically plausible network architecture onto phenotypic data from neuronal inhibition and stimulation studies, leaving aside biophysical modelling of individual neuronal activity for future models until more data is available.



A Novel Visual Discriminator for Network Traffic Patterns
inproceedings Han, L. and van Hemert, J.I. @ 2008/09/01
Proceedings of the International Conference on Advanced Engineering Computing and Applications in Sciences, pages 141-146.

Wavelet transform has been proved to be a powerful tool for characterizing network traffic. However, the resulting decomposition of wavelet transform typically forms the high-dimension space. It is obviously problematic on compact representation, visualization, and modeling based on these high-dimensional data. In this study, we employ data projection techniques to represent the high dimensional wavelet decomposition of network traffic data in a low dimensional space to facilitate the visual analysis of network traffic pattern. A low-dimensional representation can significantly reduce the model complexity, and the features of traffic pattern can be presented with a small number of parameters. The experimental results show that the proposed method could effectively discriminate the different application flows, such as FTP and P2P data flows.



Matching Spatial Regions with Combinations of Interacting Gene Expression Patterns
inproceedings van Hemert, J.I. and R.A. Baldock @ 2008/07/07
Proceedings of the 2nd International Conference on BioInformatics Research and Development, pages 347-361.

The Edinburgh Mouse Atlas aims to capture in-situ gene expression patterns in a common spatial framework. In this study, we construct a grammar to define spatial regions by combinations of these patterns. Combinations are formed by applying operators to curated gene expression patterns from the atlas, thereby resembling gene interactions in a spatial context. The space of combinations is searched using an evolutionary algorithm with the objective of finding the best match to a given target pattern. We evaluate the method by testing its robustness and the statistical significance of the results it finds.



Scientific Workflow: A Survey and Research Directions
inproceedings Barker, A. and van Hemert, J.I. @ 2008/05/29
Parallel Processing and Applied Mathematics, pages 746-753.
[ url ]

Workflow technologies are emerging as the dominant approach to coordinate groups of distributed services. However with a space filled with competing specifications, standards and frameworks from multiple domains, choosing the right tool for the job is not always a straightforward task. Researchers are often unaware of the range of technology that already exists and focus on implementing yet another proprietary workflow system. As an antidote to this common problem, this paper presents a concise survey of existing workflow technology from the business and scientific domain and makes a number of key suggestions towards the future development of scientific workflow systems.



European Graduate Student Workshop on Evolutionary Computation
proceedings Di Chio, C. and Giacobini, M. and van Hemert, J.I. @ 2008/04/27
[ pdf ]

Evolutionary computation involves the study of problem-solving and optimization techniques inspired by principles of evolution and genetics. As any other scientific field, its success relies on the continuity provided by new researchers joining the field to help it progress. One of the most important sources for new researchers is the next generation of PhD students that are actively studying a topic relevant to this field. It is from this main observation the idea arose of providing a platform exclusively for PhD students.



Evolutionary Computation in Combinatorial Optimization, 8th European Conference
proceedings van Hemert, J.I. and Cotta, C. @ 2008/03/26
Evolutionary Computation in Combinatorial Optimization.
[ url ]

Metaheuristics have shown to be effective for difficult combinatorial optimization problems appearing in various industrial, economical, and scientific domains. Prominent examples of metaheuristics are evolutionary algorithms, tabu search, simulated annealing, scatter search, memetic algorithms, variable neighborhood search, iterated local search, greedy randomized adaptive search procedures, ant colony optimization and estimation of distribution algorithms. Problems solved successfully include scheduling, timetabling, network design, transportation and distribution, vehicle routing, the travelling salesman problem, packing and cutting, satisfiability and general mixed integer programming. EvoCOP began in 2001 and has been held annually since then. It is the first event specifically dedicated to the application of evolutionary computation and related methods to combinatorial optimization problems. Originally held as a workshop, EvoCOP became a conference in 2004. The events gave researchers an excellent opportunity to present their latest research and to discuss current developments and applications. Following the general trend of hybrid metaheuristics and diminishing boundaries between the different classes of metaheuristics, EvoCOP has broadened its scope over the last years and invited submissions on any kind of metaheuristic for combinatorial optimization.



Graph Colouring Heuristics Guided by Higher Order Graph Properties
inproceedings Juhos, I. and van Hemert, J.I. @ 2008/03/26
Evolutionary Computation in Combinatorial Optimization, pages 97-108.

Graph vertex colouring can be defined in such a way where colour assignments are substituted by vertex contractions. We present various hyper-graph representations for the graph colouring problem all based on the approach where vertices are merged into groups. In this paper, we show this provides a uniform and compact way to define algorithms, both of a complete or a heuristic nature. Moreover, the representation provides information useful to guide algorithms during their search. In this paper we focus on the quality of solutions obtained by graph colouring heuristics that make use of higher order properties derived during the search. An evolutionary algorithm is used to search permutations of possible merge orderings.



Data Integration in eHealth: A Domain/Disease Specific Roadmap
inproceedings J. Ure and R. Proctor and M. Martone and D. Porteous and S. Lloyd and S. Lawrie and D. Job and R. Baldock and A. Philp and D. Liewald and F. Rakebrand and A. Blaikie and C. McKay and S. Anderson and J. Ainsworth and van Hemert, J. and I. Blanquer and R. Sinnott and C. Barillot and F. Bernard Gibaud and A. Williams and M. Hartswood and P. Watson and L. Smith and A. Burger and J. Kennedy and H. Gonzalez-Velez and R. Stevens and O. Coecho and R. Morton and P. Linksted and M. Deschenne and M. McGilchrist and P Johnson and A. Voss and R. Gertz and J. Wardlaw @ 2007/04/27
Studies in Health Technology and Informatics, pages 144-153.
[ pdf ]

The paper documents a series of data integration workshops held in 2006 at the UK National e-Science Centre, summarizing a range of the problem/solution scenarios in multi-site and multi-scale data integration with six HealthGrid projects using schizophrenia as a domain-specific test case. It outlines emerging strategies, recommendations and objectives for collaboration on shared ontology-building and harmonization of data for multi-site trials in this domain.



Evolutionary Computation in Combinatorial Optimization, 7th European Conference
proceedings Cotta, C. and van Hemert, J.I. @ 2007/04/11
Evolutionary Computation in Combinatorial Optimization.
[ url ]

Metaheuristics have often been shown to be effective for difficult combinatorial optimization problems appearing in various industrial, economical, and scientific domains. Prominent examples of metaheuristics are evolutionary algorithms, simulated annealing, tabu search, scatter search, memetic algorithms, variable neighborhood search, iterated local search, greedy randomized adaptive search procedures, estimation of distribution algorithms, and ant colony optimization. Successfully solved problems include scheduling, timetabling, network design, transportation and distribution, vehicle routing, the traveling salesman problem, satisfiability, packing and cutting, and general mixed integer programming. EvoCOP began in 2001 and has been held annually since then. It was the first event specifically dedicated to the application of evolutionary computation and related methods to combinatorial optimization problems. Originally held as a workshop, EvoCOP became a conference in 2004. The events gave researchers an excellent opportunity to present their latest research and to discuss current developments and applications as well as providing for improved interaction between members of this scientific community. Following the general trend of hybrid metaheuristics and diminishing boundaries between the different classes of metaheuristics, EvoCOP has broadened its scope over the last years and invited submissions on any kind of metaheuristic for combinatorial optimization.



European Graduate Student Workshop on Evolutionary Computation
proceedings Giacobini, M. and van Hemert, J.I. @ 2007/03/27
[ pdf ]

Evolutionary computation involves the study of problem-solving and optimization techniques inspired by principles of evolution and genetics. As any other scientific field, its success relies on the continuity provided by new researchers joining the field to help it progress. One of the most important sources for new researchers is the next generation of PhD students that are actively studying a topic relevant to this field. It is from this main observation the idea arose of providing a platform exclusively for PhD students.



Mining spatial gene expression data for association rules
inproceedings van Hemert, J.I. and R.A. Baldock @ 2007/03/12
Proceedings of the 1st International Conference on BioInformatics Research and Development, pages 66-76.
[ pdf | url ]

We analyse data from the Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) which is a high quality data source for spatio-temporal gene expression patterns. Using a novel process whereby generated patterns are used to probe spatially-mapped gene expression domains, we are able to get unbiased results as opposed to using annotations based predefined anatomy regions. We describe two processes to form association rules based on spatial configurations, one that associates spatial regions, the other associates genes.



Evolving combinatorial problem instances that are difficult to solve
article van Hemert, J.I. @ 2006/12/01
Evolutionary Computation, 14(4), 2006, pages 433-462.
[ url ]

In this paper we demonstrate how evolutionary computation can be used to acquire difficult to solve combinatorial problem instances, thereby stress-testing the corresponding algorithms used to solve these instances. The technique is applied in three important domains of combinatorial optimisation, binary constraint satisfaction, Boolean satisfiability, and the travelling salesman problem. Problem instances acquired through this technique are more difficult than ones found in popular benchmarks. We analyse these evolved instances with the aim to explain their difficulty in terms of structural properties, thereby exposing the weaknesses of corresponding algorithms.



Increasing the efficiency of graph colouring algorithms with a representation based on vector operations
article I. Juhos and van Hemert, J.I. @ 2006/08/01
Journal of Software, 1(2), 2006, pages 24-33.
[ pdf ]

We introduce a novel representation for the graph colouring problem, called the Integer Merge Model, which aims to reduce the time complexity of graph colouring algorithms. Moreover, this model provides useful information to aid in the creation of heuristics that can make the colouring process even faster. It also serves as a compact definition for the description of graph colouring algorithms. To verify the potential of the model, we use it in the complete algorithm DSATUR, and in two version of an incomplete approximation algorithm; an evolutionary algorithm and the same evolutionary algorithm extended with guiding heuristics. Both theoretical and empirical results are provided investigation is performed to show an increase in the efficiency of solving graph colouring problems. Two problem suites were used for the empirical evidence: a set of practical problem instances and a set of hard problem instances from the phase transition.



Neighborhood Searches for the Bounded Diameter Minimum Spanning Tree Problem Embedded in a VNS, EA, and ACO
inproceedings M. Gruber and van Hemert, J.I. and G.R. Raidl @ 2006/07/08, Seattle, USA
Proceedings of the Genetic and Evolutionary Computation Conference, pages 1187-1194.
[ pdf ]

We consider the Bounded Diameter Minimum Spanning Tree problem and describe four neighbourhood searches for it. They are used as local improvement strategies within a variable neighbourhood search (VNS), an evolutionary algorithm (EA) utilising a new encoding of solutions, and an ant colony optimisation (ACO).We compare the performance in terms of effectiveness between these three hybrid methods on a suite f popular benchmark instances, which contains instances too large to solve by current exact methods. Our results show that the EA and the ACO outperform the VNS on almost all used benchmark instances. Furthermore, the ACO yields most of the time better solutions than the EA in long-term runs, whereas the EA dominates when the computation time is strongly restricted.



European Graduate Student Workshop on Evolutionary Computation
proceedings M. Giacobini and van Hemert, J.I. @ 2006/04/10
[ pdf ]

Evolutionary computation involves the study of problem-solving and optimization techniques inspired by principles of evolution and genetics. As any other scientific field, its success relies on the continuity provided by new researchers joining the field to help it progress. One of the most important sources for new researchers is the next generation of PhD students that are actively studying a topic relevant to this field. It is from this main observation the idea arose of providing a platform exclusively for PhD students.



Improving Graph Colouring Algorithms and Heuristics Using a Novel Representation
inproceedings I. Juhos and van Hemert, J.I. @ 2006/04/10
Evolutionary Computation in Combinatorial Optimization, pages 123-134.
[ pdf ]

We introduce a novel representation for the graph colouring problem, called the Integer Merge Model, which aims to reduce the time complexity of an algorithm. Moreover, our model provides useful information for guiding heuristics as well as a compact description for algorithms. To verify the potential of the model, we use it in dsatur, in an evolutionary algorithm, and in the same evolutionary algorithm extended with heuristics. An empiricial investigation is performed to show an increase in efficiency on two problem suites , a set of practical problem instances and a set of hard problem instances from the phase transition.



Evolutionary Transitions as a Metaphor for Evolutionary Optimization
inproceedings A. Defaweux and T. Lenaerts and van Hemert, J.I. @ 2005/09/09
Advances in Artificial Life, pages 342-352.
[ pdf ]

This paper proposes a computational model for solving optimisation problems that mimics the principle of evolutionary transitions in individual complexity. More specifically it incorporates mechanisms for the emergence of increasingly complex individuals from the interaction of more simple ones. The biological principles for transition are outlined and mapped onto an evolutionary computation context. The class of binary constraint satisfaction problems is used to illustrate the transition mechanism.



Complexity Transitions in Evolutionary Algorithms: Evaluating the impact of the initial population
inproceedings A. Defaweux and T. Lenaerts and van Hemert, J.I. and J. Parent @ 2005/08/04
Proceedings of the Congress on Evolutionary Computation, pages 196-203.
[ pdf ]

This paper proposes an evolutionary approach for the composition of solutions in an incremental way. The approach is based on the metaphor of transitions in complexity discussed in the context of evolutionary biology. Partially defined solutions interact and evolve into aggregations until a full solution for the problem at hand is found. The impact of the initial population on the outcome and the dynamics of the process is evaluated using the domain of binary constraint satisfaction problems.



Transition Models as an incremental approach for problem solving in Evolutionary Algorithms
inproceedings A. Defaweux and T. Lenaerts and van Hemert, J.I. and J. Parent @ 2005/07/25
Proceedings of the Genetic and Evolutionary Computation Conference, pages 599-606.
[ pdf ]

This paper proposes an incremental approach for building solutions using evolutionary computation. It presents a simple evolutionary model called a Transition model. It lets building units of a solution interact and then uses an evolutionary process to merge these units toward a full solution for the problem at hand. The paper provides a preliminary study on the evolutionary dynamics of this model as well as an empirical comparison with other evolutionary techniques on binary constraint satisfaction.



Property analysis of symmetric travelling salesman problem instances acquired through evolution
inproceedings van Hemert, J.I. @ 2005/03/30
Evolutionary Computation in Combinatorial Optimization, pages 122-131.
[ pdf ]

We show how an evolutionary algorithm can successfully be used to evolve a set of difficult to solve symmetric travelling salesman problem instances for two variants of the Lin-Kernighan algorithm. Then we analyse the instances in those sets to guide us towards deferring general knowledge about the efficiency of the two variants in relation to structural properties of the symmetric travelling salesman problem.



Heuristic Colour Assignment Strategies for Merge Models in Graph Colouring
inproceedings I. Juhos and A. Tóth and van Hemert, J.I. @ 2005/03/30
Evolutionary Computation in Combinatorial Optimization, pages 132-143.
[ pdf ]

In this paper, we combine a powerful representation for graph colouring problems with different heuristic strategies for colour assignment. Our novel strategies employ heuristics that exploit information about the partial colouring in an aim to improve performance. An evolutionary algorithm is used to drive the search. We compare the different strategies to each other on several very hard benchmarks and on generated problem instances, and show where the novel strategies improve the efficiency.



Genetic Programming, Proceedings of the 8th European Conference
proceedings M. Keijzer and A. Tettamanzi and P. Collet and J. van Hemert and M. Tomassini @ 2005/03/14
Genetic Programming, Proceedings of the 8th European Conference.
[ url ]


Robust parameter settings for variation operators by measuring the resampling ratio: A study on binary constraint satisfaction problems
article van Hemert, J.I. and T. Bäck @ 2004/12/17
Journal of Heuristics, 10(6), 2004, pages 629-640.

In this article, we try to provide insight into the consequence of mutation and crossover rates when solving binary constraint satisfaction problems. This insight is based on a measurement of the space searched by an evolutionary algorithm. From data empirically acquired we describe the relation between the success ratio and the searched space. This is achieved using the resampling ratio, which is a measure for the amount of points revisited by a search algorithm. This relation is based on combinations of parameter settings for the variation operators. We then show that the resampling ratio is useful for identifying the quality of parameter settings, and provide a range that corresponds to robust parameter settings.



Phase transition properties of clustered travelling salesman problem instances generated with evolutionary computation
inproceedings van Hemert, J.I. and N.B. Urquhart @ 2004/09/18, Birmingham, UK
Parallel Problem Solving from Nature, pages 150-159.
[ pdf | url ]

This paper introduces a generator that creates problem instances for the Euclidean symmetric travelling salesman problem. To fit real world problems, we look at maps consisting of clustered nodes. Uniform random sampling methods do not result in maps where the nodes are spread out to form identifiable clusters. To improve upon this, we propose an evolutionary algorithm that uses the layout of nodes on a map as its genotype. By optimising the spread until a set of constraints is satisfied, we are able to produce better clustered maps, in a more robust way. When varying the number of clusters in these maps and, when solving the Euclidean symmetric travelling salesman person using Chained Lin-Kernighan, we observe a phase transition in the form of an easy-hard-easy pattern.



Dynamic Routing Problems with Fruitful Regions: Models and Evolutionary Computation
inproceedings van Hemert, J.I. and la Poutré, J.A. @ 2004/09/18, Birmingham, UK
Parallel Problem Solving from Nature, pages 690-699.
[ pdf ]

We introduce the concept of fruitful regions in a dynamic routing context: regions that have a high potential of generating loads to be transported. The objective is to maximise the number of loads transported, while keeping to capacity and time constraints. Loads arrive while the problem is being solved, which makes it a real-time routing problem. The solver is a self-adaptive evolutionary algorithm that ensures feasible solutions at all times. We investigate under what conditions the exploration of fruitful regions improves the effectiveness of the evolutionary algorithm.



A Study into Ant Colony Optimization, Evolutionary Computation and Constraint Programming on Binary Constraint Satisfaction Problems
inproceedings van Hemert, J.I. and C. Solnon @ 2004/04/01
Evolutionary Computation in Combinatorial Optimization, pages 114-123.
[ ps | pdf ]

We compare two heuristic approaches, evolutionary computation and ant colony optimisation, and a complete tree-search approach, constraint programming, for solving binary constraint satisfaction problems. We experimentally show that, if evolutionary computation is far from being able to compete with the two other approaches, ant colony optimisation nearly always succeeds in finding a solution, so that it can actually compete with constraint programming. The resampling ratio is used to provide insight into heuristic algorithms performances. Regarding efficiency, we show that if constraint programming is the fastest when instances have a low number of variables, ant colony optimisation becomes faster when increasing the number of variables.



Binary Merge Model Representation of the Graph Colouring Problem
inproceedings I. Juhos and A. Tóth and van Hemert, J.I. @ 2004/04/01
Evolutionary Computation in Combinatorial Optimization, pages 124-134.
[ ps | pdf ]

This paper describes a novel representation and ordering model that aided by an evolutionary algorithm, is used in solving the graph k-colouring problem. Its strength lies in reducing the search space by breaking symmetry. An empirical comparison is made with two other algorithms on a standard suit of problem instances and on a suit of instances in the phase transition where it shows promising results.



Evolving binary constraint satisfaction problem instances that are difficult to solve
inproceedings van Hemert, J.I. @ 2003/12/08
Proceedings of the IEEE 2003 Congress on Evolutionary Computation, pages 1267-1273.
[ pdf | ps.gz ]

We present a study on the difficulty of solving binary constraint satisfaction problems where an evolutionary algorithm is used to explore the space of problem instances. By directly altering the structure of problem instances and by evaluating the effort it takes to solve them using a complete algorithm we show that the evolutionary algorithm is able to detect problem instances that are harder to solve than those produced with conventional methods. Results from the search of the evolutionary algorithm confirm conjectures about where the most difficult to solve problem instances can be found with respect to the tightness.



Comparing Evolutionary Algorithms on Binary Constraint Satisfaction Problems
article B.G.W. Craenen and A.E. Eiben and van Hemert, J.I. @ 2003/10/01
IEEE Transactions on Evolutionary Computation, 7(5), 2003, pages 424-444.
[ pdf | url ]

Constraint handling is not straightforward in evolutionary algorithms (EA) since the usual search operators, mutation and recombination, are `blind' to constraints. Nevertheless, the issue is highly relevant, for many challenging problems involve constraints. Over the last decade numerous EAs for solving constraint satisfaction problems (CSP) have been introduced and studied on various problems. The diversity of approaches and the variety of problems used to study the resulting algorithms prevents a fair and accurate comparison of these algorithms. This paper aligns related work by presenting a concise overview and an extensive performance comparison of all these EAs on a systematically generated test suite of random binary CSPs. The random problem instance generator is based on a theoretical model that fixes deficiencies of models and respective generators that have been formerly used in the Evolutionary Computing (EC) field.



A new permutation model for solving the graph k-coloring problem
inproceedings I. Juhos and A. Tóth and M. Tezuka and P. Tann and van Hemert, J.I. @ 2003/10/01
Kalmàr Workshop on Logic and Computer Science, pages 189-199.
[ pdf ]

This paper describes a novel representation and ordering model, that is aided by an evolutionary algorithm, is used in solving the graph k-coloring. A comparison is made between the new representation and an improved version of the traditional graph coloring technique DSATUR on an extensive list of graph k-coloring problem instances with different properties. The results show that our model outperforms the improved DSATUR on most of the problem instances.



Measuring the Searched Space to Guide Efficiency: The Principle and Evidence on Constraint Satisfaction
inproceedings van Hemert, J.I. and T. Bäck @ 2002/09/12
Parallel Problem Solving from Nature, pages 23-32.
[ pdf | ps.gz ]

In this paper we present a new tool to measure the efficiency of evolutionary algorithms by storing the whole searched space of a run, a process whereby we gain insight into the number of distinct points in the state space an algorithm has visited as opposed to the number of function evaluations done within the run. This investigation demonstrates a certain inefficiency of the classical mutation operator with mutation-rate 1/l, where l is the dimension of the state space. Furthermore we present a model for predicting this inefficiency and verify it empirically using the new tool on binary constraint satisfaction problems.



Comparing Classical Methods for Solving Binary Constraint Satisfaction Problems with State of the Art Evolutionary Computation
inproceedings van Hemert, J.I. @ 2002/04/04
Applications of Evolutionary Computing, pages 81-90.
[ pdf | ps.gz ]

Constraint Satisfaction Problems form a class of problems that are generally computationally difficult and have been addressed with many complete and heuristic algorithms. We present two complete algorithms, as well as two evolutionary algorithms, and compare them on randomly generated instances of binary constraint satisfaction prob-lems. We find that the evolutionary algorithms are less effective than the classical techniques.



Use of Evolutionary Algorithms for Telescope Scheduling
inproceedings R. Grim and M.L.M. Jansen and A. Baan and van Hemert, J.I. and H. de Wolf @ 2002/02/05
Integrated Modeling of Telescopes, pages 51-61.
[ pdf | ps.gz ]

LOFAR, a new radio telescope, will be designed to observe with up to 8 independent beams, thus allowing several simultaneous observations. Scheduling of multiple observations parallel in time, each having their own constraints, requires a more intelligent and flexible scheduling function then operated before. In support of the LOFAR radio telescope project, and in co-operation with Leiden University, Fokker Space has started a study to investigate the suitability of the use of evolutionary algorithms applied to complex scheduling problems. After a positive familiarisation phase, we now examine the potential use of evolutionary algorithms via a demonstration project. Results of the familiarisation phase, and the first results of the demonstration project are presented in this paper.



A ``Futurist'' approach to dynamic environments
inproceedings van Hemert, J.I. and Van Hoyweghen, C. and E. Lukschandl and K. Verbeeck @ 2001/07/17
Proceedings of the Workshops at the Genetic and Evolutionary Computation Conference, Dynamic Optimization Problems, pages 35-38.
[ pdf | ps.gz ]

The optimization of dynamic environments has proved to be a difficult area for Evolutionary Algorithms. As standard haploid populations find it difficult to track a moving target, diffKerent schemes have been suggested to improve the situation. We study a novel approach by making use of a meta learner which tries to predict the next state of the environment, i.e. the next value of the goal the individuals have to achieve, by making use of the accumulated knowledge from past performance.



An Engineering Approach to Evolutionary Art
inproceedings van Hemert, J.I. and M.L.M. Jansen @ 2001/07/15
Proceedings of the Genetic and Evolutionary Computation Conference, pages 177.
[ pdf | ps.gz ]

We present a general system that evolves art on the Internet. The system runs on a server which enables it to collect information about its usage world wide; its core uses operators and representations from genetic program-ming. We show two types of art that can be evolved using this general system.



Adaptive Genetic Programming Applied to New and Existing Simple Regression Problems
inproceedings J. Eggermont and van Hemert, J.I. @ 2001/05/01
Genetic Programming, pages 23-35.
[ pdf | ps.gz ]

In this paper we continue our study on adaptive genetic pro-gramming. We use Stepwise Adaptation of Weights to boost performance of a genetic programming algorithm on simple symbolic regression problems. We measure the performance of a standard GP and two variants of SAW extensions on two different symbolic regression prob-lems from literature. Also, we propose a model for randomly generating polynomials which we then use to further test all three GP variants.



Evolutionary Computation in Constraint Satisfaction and Machine Learning - An abstract of my PhD.
inproceedings van Hemert, J.I. @ 2001/05/01
Proceedings of the Brussels Evolutionary Algorithms Day (BEAD-2001).
[ pdf | ps.gz ]


De Creatieve Computer
article van Hemert, J.I. @ 2000/11/03
AIgg Kennisgeving, 13(3), 2000, pages 10-18.
[ pdf | ps.gz ]

Here we show an application that generates images resembling art as it was produced by Mondriaan, a Dutch artist, well known for his minimalistic and pure abstract pieces of art. The current version generates images using a linear chromosome and a recursive function as a decoder.



Constraint Satisfaction Problems and Evolutionary Algorithms: A Reality Check
inproceedings van Hemert, J.I. @ 2000/11/01
Proceedings of the Twelfth Belgium/Netherlands Conference on Artificial Intelligence, pages 267-274.
[ pdf | ps.gz ]

Constraint satisfaction has been the subject of many studies. Different areas of research have tried to solve all kind of constraint problems. Here we will look at a general model for constraint satisfaction problems in the form of binary constraint satisfaction. The problems generated from this model are studied in the research area of constraint programming and in the research area of evolutionary computation. This paper provides an empirical comparison of two techniques from each area. Basically, this is a check on how well both areas are doing. It turns out that, although evolutionary algorithms are doing well, classic approaches are still more successful.



Stepwise Adaptation of Weights for Symbolic Regression with Genetic Programming
inproceedings J. Eggermont and van Hemert, J.I. @ 2000/11/01
Proceedings of the Twelfth Belgium/Netherlands Conference on Artificial Intelligence, pages 259-266.
[ pdf | ps.gz ]

In this paper we continue study on the Stepwise Adaptation of Weights (SAW) technique. Previous studies on constraint satisfaction and data clas-sification have indicated that SAW is a promising technique to boost the performance of evolutionary algorithms. Here we use SAW to boost per-formance of a genetic programming algorithm on simple symbolic regression problems. We measure the performance of a standard GP and two variants of SAW extensions on two different symbolic regression problems.



SAW-ing EAs: adapting the fitness function for solving constrained problems
incollection A.E. Eiben and van Hemert, J.I. @ 1999/12/01
New ideas in optimization, pages 389-402.
[ pdf | ps.gz ]

In this chapter we describe a problem independent method for treating constrain ts in an evolutionary algorithm. Technically, this method amounts to changing the defini tion of the fitness function during a run of an EA, based on feedback from the search pr ocess. Obviously, redefining the fitness function means redefining the problem to be sol ved. On the short term this deceives the algorithm making the fitness values deteriorate , but as experiments clearly indicate, on the long run it is beneficial. We illustrate t he power of the method on different constraint satisfaction problems and point out other application areas of this technique.



Mondriaan Art by Evolution
inproceedings van Hemert, J.I. and A.E. Eiben @ 1999/11/03
Proceedings of the Eleventh Belgium/Netherlands Conference on Artificial Intelligence, pages 291-292.
[ ps.gz ]

Here we show an application that generates images resembling art as it was produced by Mondriaan, a Dutch artist, well known for his minimalistic and pure abstract pieces of art. The current version generates images using a linear chromosome and a recursive function as a decoder.



Comparing genetic programming variants for data classification
inproceedings J. Eggermont and A.E. Eiben and van Hemert, J.I. @ 1999/11/03
Proceedings of the Eleventh Belgium/Netherlands Conference on Artificial Intelligence, pages 253-254.
[ ps.gz ]

This article is a combined summary of two papers written by the authors. Binary data classification problems (with exactly two disjoint classes) form an important application area of machine learning techniques, in particular genetic programming (GP). In this study we compare a number of different variants of GP applied to such problems whereby we investigate the effect of two significant changes in a fixed GP setup in combination with two different evolutionary models



A comparison of genetic programming variants for data classification
inproceedings J. Eggermont and A.E. Eiben and van Hemert, J.I. @ 1999/08/09
Advances in Intelligent Data Analysis, pages 281-290.
[ ps.gz ]

In this paper we report the results of a comparative study on different variations of genetic programming applied on binary data classification problems. The first genetic programming variant is weighting data records for calculating the classification error and modifying the weights during the run. Hereby the algorithm is defining its own fitness function in an on-line fashion giving higher weights to `hard' records. Another novel feature we study is the atomic representation, where `Booleanization' of data is not performed at the root, but at the leafs of the trees and only Boolean functions are used in the trees' body. As a third aspect we look at generational and steady-state models in combination of both features.



Population dynamics and emerging features in AEGIS
inproceedings A.E. Eiben and D. Elia and van Hemert, J.I. @ 1999/07/15
Proceedings of the Genetic and Evolutionary Computation Conference, pages 1257-1264.
[ pdf | ps.gz ]

We describe an empirical investigation within an artificial world, aegis, where a population of animals and plants is evolving. We compare different system setups in search of an `ideal' world that allows a constantly high number of inhabitants for a long period of time. We observe that high responsiveness at individual level (speed of movement) or population level (high fertility) are `ideal'. Furthermore, we investigate the emergence of the so-called mental features of animals determining their social, consumptional and aggressive behaviour. The tests show that being socially oriented is generally advantageous, while agressive behaviour only emerges under specific circumstances.



Adapting the Fitness Function in GP for Data Mining
inproceedings J. Eggermont and A.E. Eiben and van Hemert, J.I. @ 1999/05/26
Genetic Programming, pages 195-204.
[ ps.gz ]

In this paper we describe how the Stepwise Adaptation of Weights (SAW) technique can be applied in genetic programming. The SAW-ing mechanism has been originally developed for and successfully used in EAs for constraint satisfaction problems. Here we identify the very basic underlying ideas behind SAW-ing and point out how it can be used for different types of problems. In particular, SAW-ing is well suited for data mining tasks where the fitness of a candidate solution is composed by `local scores' on data records. We evaluate the power of the SAW-ing mechanism on a number of benchmark classification data sets. The results indicate that extending the GP with the SAW-ing feature increases its performance when different types of misclassifications are not weighted differently, but leads to worse results when they are.



Extended abstract: Solving Binary Constraint Satisfaction Problems using Evolutionary Algorithms with an Adaptive Fitness Function
inproceedings A.E. Eiben and van Hemert, J.I. and E. Marchiori and A.G. Steenbeek @ 1998/11/18
Proceedings of the Xth Netherlands/Belgium Conference on Artificial Intelligence (NAIC'98), pages 299-301.


Solving Binary Constraint Satisfaction Problems using Evolutionary Algorithms with an Adaptive Fitness Function
inproceedings A.E. Eiben and van Hemert, J.I. and E. Marchiori and A.G. Steenbeek @ 1998/09/20
Parallel Problem Solving from Nature, pages 196-205.
[ ps.gz ]

This paper presents a comparative study of Evolutionary Algorithms (EAs) for Constraint Satisfaction Problems (CSPs). We focus on EAs where fitness is based on penalization of constraint violations and the penalties are adapted during the execution. Three different EAs based on this approach are implemented. For highly connected constraint networks, the results provide further empirical support to the theoretical prediction of the phase transition in binary CSPs.



Graph Coloring with Adaptive Evolutionary Algorithms
article A.E. Eiben and van der Hauw, J.K. and van Hemert, J.I. @ 1998/03/01
Journal of Heuristics, 4(1), 1998, pages 25-46.
[ ps.gz ]

This paper presents the results of an experimental investigation on solving graph coloring problems with Evolutionary Algorithms (EA). After testing different algorithm variants we conclude that the best option is an asexual EA using order-based representation and an adaptation mechanism that periodically changes the fitness function during the evolution. This adaptive EA is general, using no domain specific knowledge, except, of course, from the decoder (fitness function). We compare this adaptive EA to a powerful traditional graph coloring technique DSatur and the Grouping GA on a wide range of problem instances with different size, topology and edge density. The results show that the adaptive EA is superior to the Grouping GA and outperforms DSatur on the hardest problem instances. Furthermore, it scales up better with the problem size than the other two algorithms and indicates a linear computational complexity.