Jano's Homepage
 · About
 · Contact
 · Interests

Jano van Hemert


I lead the imaging group and am the academic liaison at Optos. I am also an honorary fellow of the University of Edinburgh associated with the Data-Intensive Research Group in the School of Informatics. In 2011, I became a member of the Young Academy of Scotland of the Royal Society of Edinburgh. Below you can catch up with my latest mischief. On LinkedIn I keep track of my professional activities. I have added a database for browsing my academic papers.

Latest news

Source for extracting features from TSP instances
file J.I. van Hemert and Kate Smith-Miles and Lin Xu and Kevin Leyton-Brown and Frank Hutter and Holger Roos @ 2012/02/12

Contains a collection of source files authored by different people that put together allows the extraction of features from travelling salesman problem instances.

Software Licenses
info J.I. van Hemert @ 2012/02/12, Edinburgh, UK

Software published before 2012 on this website is licenced under the GNU General Public License. Please use the above link to read the full text of this license. Since 2012 I have moved on to Academic Free License (

Managing dynamic enterprise and urgent workloads on clouds using layered queuing and historical performance models
article David A. Bacigalupo and Jano I. van Hemert and Xiaoyu Chen and Asif Usmani and Adam P. Chester and Ligang He and Donna N. Dillenberger and Gary B. Wills and Lester Gilbert and Stephen A. Jarvis @ 2011/09/17
Simulation Modelling Practice and Theory, 19(6), 2011, pages 1479-1495.

The automatic allocation of enterprise workload to resources can be enhanced by being able to make what-if response time predictions whilst different allocations are being considered. We experimentally investigate an historical and a layered queuing performance model and show how they can provide a good level of support for a dynamic-urgent cloud environment. Using this we define, implement and experimentally investigate the effectiveness of a prediction-based cloud workload and resource management algorithm. Based on these experimental analyses we: (i) comparatively evaluate the layered queuing and historical techniques; (ii) evaluate the effectiveness of the management algorithm in different operating scenarios; and (iii) provide guidance on using prediction-based workload and resource management.

Performance database: capturing data for optimizing distributed streaming workflows
article Liew, C.S. and Atkinson, M.P. and Ostrowski, R. and Cole, M. and van Hemert, J.I. and Han, L. @ 2011/07/19
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 369(1949), 2011, pages 3268-3284.
[ url ]

The performance database (PDB) stores performance-related data gathered during workflow enactment. We argue that, by carefully understanding and manipulating these data, we can improve efficiency when enacting workflows. This paper describes the rationale behind the PDB, and proposes a systematic way to implement it. The prototype is built as part of the Advanced Data Mining and Integration Research for Europe project. We use workflows from real-world experiments to demonstrate the usage of PDB.

Validation and mismatch repair of workflows through typed data streams
article Yaikhom, G. and Atkinson, M.P. and van Hemert, J.I. and Corcho, O. and Krause, A. @ 2011/07/19
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 369(1949), 2011, pages 3285-3299.
[ url ]

The type system of a language guarantees that all of the operations on a set of data comply with the rules and conditions set by the language. While language typing is a fundamental requirement for any programming language, the typing of data that flow between processing elements within a workflow is currently being treated as optional. In this paper, we introduce a three-level type system for typing workflow data streams. These types are parts of the Data Intensive System Process Engineering Language programming language, which empowers users with the ability to validate the connections inside a workflow composition, and apply appropriate data type conversions when necessary. Furthermore, this system enables the enactment engine in carrying out type-directed workflow optimizations.

A user-friendly web portal for T-Coffee on supercomputers
article J. Rius and F. Cores and F. Solsona and van Hemert, J.I. and J. Koetsier and C. Notredame @ 2011/07/01
BMC Bioinformatics, 12(150), 2011.
[ url ]

Background Parallel T-Coffee (PTC) was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. Results In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. Conclusions The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.

Automatically Identifying and Annotating Mouse Embryo Gene Expression Patterns
article Han, L. and van Hemert, J.I. and Baldock, R.A. @ 2011/06/01
Bioinformatics, 27(8), 2011, pages 1101-1107.
[ url ]

Motivation: Deciphering the regulatory and developmental mechanisms for multicellular organisms requires detailed knowledge of gene interactions and gene expressions. The availability of large datasets with both spatial and ontological annotation of the spatio-temporal patterns of gene-expression in mouse embryo provides a powerful resource to discover the biological function of embryo organisation. Ontological annotation of gene expressions consists of labelling images with terms from the anatomy ontology for mouse development. If the spatial genes of an anatomical component are expressed in an image, the image is then tagged with a term of that anatomical component. The current annotation is done manually by domain experts, which is both time consuming and costly. In addition, the level of detail is variable and inevitably, errors arise from the tedious nature of the task. In this paper, we present a new method to automatically identify and annotate gene expression patterns in the mouse embryo with anatomical terms.Results: The method takes images from in situ hybridisation studies and the ontology for the developing mouse embryo, it then combines machine learning and image processing techniques to produce classifiers that automatically identify and annotate gene expression patterns in these images.We evaluate our method on image data from the EURExpress-II study where we use it to automatically classify nine anatomical terms: humerus, handplate, fibula, tibia, femur, ribs, petrous part, scapula and head mesenchyme. The accuracy of our method lies between 70-80% with few exceptions. We show that other known methods have lower classification performance than ours.We have investigated the images misclassified by our method and found several cases where the original annotation was not correct. This shows our method is robust against this kind of noise.Availability: The annotation result and the experimental dataset in the paper can be freely accessed at http://www2.docm.mmu.ac.uk/STAFF/L.Han/geneannotation/.Contact: l.han@mmu.ac.uk, j.vanhemert@ed.ac.uk and Richard.Baldock@hgu.mrc.ac.uk

Discovering the suitability of optimisation algorithms by learning from evolved instances
article K. Smith-Miles and van Hemert, J.I. @ 2011/05/03
Annals of Mathematics and Artificial Intelligence, Online First(), 2011.
[ url ]

Generating web-based user interfaces for computational science
article van Hemert, J.I. and Koetsier, J. and Torterolo, L. and Porro, I. and Melato, M. and Barbera, R. @ 2011/05/01
Concurrency and Computation: Practice and Experience, 23(), 2011, pages 256-268.

Scientific gateways in the form of web portals are becoming the popular approach to share knowledge and resources around a topic in a community of researchers. Unfortunately, the development of web portals is expensive and requires specialists skills. Commercial and more generic web portals have a much larger user base and can afford this kind of development. Here we present two solutions that address this problem in the area of portals for scientific computing; both take the same approach. The whole process of designing, delivering and maintaining a portal can be made more cost-effective by generating a portal from a description rather than programming in the traditional sense. We show four successful use cases to show how this process works and the results it can deliver.

A Generic Parallel Processing Model for Facilitating Data Mining and Integration
article L. Han, C.S. Liew and Malcolm P.A. and van Hemert, J.I @ 2011/04/19
Parallel Computing, (), 2011, pages In press, available online.
[ url ]

To facilitate Data Mining and Integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements PEs. The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provides room for performance enhancement. We have applied this approach to a real DMI case in the Life Sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.

Special Issue: Portals for life sciences-Providing intuitive access to bioinformatic tools
article Gesing, S. and van Hemert, J.I. and Kacsuk, P. and Kohlbacher, O. @ 2011/02/18
Concurrency and Computation: Practice and Experience, 23(), 2011, pages 223-234.

The topic `Portals for life sciences' includes various research fields, on the one hand many different topics out of life sciences, e.g. mass spectrometry, on the other hand portal technologies and different aspects of computer science, such as usability of user interfaces and security of systems. The main aspect about portals is to simplify the user's interaction with computational resources that are concerted to a supported application domain.

Resource management of enterprise cloud systems using layered queuing and historical performance models
inproceedings Bacigalupo, D.A. and van Hemert, J. and Usmani, A. and Dillenberger, D.N. and Wills, G.B. and Jarvis, S.A. @ 2010/09/23
IEEE International Symposium on Parallel Distributed Processing, pages 1-8.

The automatic allocation of enterprise workload to resources can be enhanced by being able to make `what-if' response time predictions, whilst different allocations are being considered. It is important to quantitatively compare the effectiveness of different prediction techniques for use in cloud infrastructures. To help make the comparison of relevance to a wide range of possible cloud environments it is useful to consider the following. 1.) urgent cloud customers such as the emergency services that can demand cloud resources at short notice (e.g. for our FireGrid emergency response software). 2.) dynamic enterprise systems, that must rapidly adapt to frequent changes in workload, system configuration and/or available cloud servers. 3.) The use of the predictions in a coordinated manner by both the cloud infrastructure and cloud customer management systems. 4.) A broad range of criteria for evaluating each technique. However, there have been no previous comparisons meeting these requirements. This paper, meeting the above requirements, quantitatively compares the layered queuing and (\^A¿HYDRA\^A¿) historical techniques - including our initial thoughts on how they could be combined. Supporting results and experiments include the following: i.) defining, investigating and hence providing guidelines on the use of a historical and layered queuing model; ii.) using these guidelines showing that both techniques can make low overhead and typically over 70% accurate predictions, for new server architectures for which only a small number of benchmarks have been run; and iii.) defining and investigating tuning a prediction-based cloud workload and resource management algorithm.