Jano's Homepage
Personal
 · About
 · Calendar
 · Contact
 · Interests
Publications
Presentations
Projects
Photos
Past

Jano van Hemert


 
Personal


You have stumbled upon my hyperhome, welcome! Me being a scientist, this page is focused on my work. I currently lead a research group at the National e-Science Centre in the School of Informatics of the University of Edinburgh. If you are not into science, you might enjoy some of my photographs or you could take a look at some of my current or past projects.


Latest news

Rapid chemistry portals through engaging researchers
inproceedings Koetsier, J. and Turner, A. and Richardson, P. and van Hemert, J.I. @ 2009/12/08
Fifth IEEE International Conference on e-Science, pages 284-291.

In this study, we apply a methodology for rapid development of portlets for scientific computing to the domain of computational chemistry. We report results in terms of the portals delivered, the changes made to our methodology and the experience gained in terms of interaction with domain-specialists. Our major contributions are: several web portals for teaching and research in computational chemistry; a successful transition to having our development tool used by the domain specialist as opposed by us, the developers; and an updated version of our methodology and technology for rapid development of portlets for computational science, which is free for anyone to pick up and use.



Using the DCC Lifecycle Model to Curate a Gene Expression Database: A Case Study
article J. O'Donoghue and van Hemert, J.I. @ 2009/12/01
International Journal of Digital Curation, 4(3), 2009.

Developmental Gene Expression Map (DGEMap) is an EU-funded Design Study, which will accelerate an integrated European approach to gene expression in early human development. As part of this design study, we have had to address the challenges and issues raised by the long-term curation of such a resource. As this project is primarily one of data creators, learning about curation, we have been looking at some of the models and tools that are already available in the digital curation field in order to inform our thinking on how we should proceed with curating DGEMap. This has led us to uncover a wide range of resources for data creators and curators alike. Here we will discuss the future curation of DGEMap as a case study. We believe our experience could be instructive to other projects looking to improve the curation and management of their data



A model of social collaboration in Molecular Biology knowledge bases
inproceedings De Ferrari, L. and Aitken, S. and van Hemert, J.I. and Goryanin, I. @ 2009/10/01
Proceedings of the 6th Conference of the European Social Simulation Association (ESSA'09), pages In press.

Manual annotation of biological data cannot keep up with data production. Open annotation models using wikis have been proposed to address this problem. In this empirical study we analyse 36 years of knowledge collection by 738 authors in two Molecular Biology wikis (EcoliWiki and WikiPathways) and two knowledge bases (OMIM and Reactome). We first investigate authorship metrics (authors per entry and edits per author) which are power-law distributed in Wikipedia and we find they are heavy-tailed in these four systems too. We also find surprising similarities between the open (editing open to everyone) and the closed systems (expert curators only). Secondly, to discriminate between driving forces in the measured distributions, we simulate the curation process and find that knowledge overlap among authors can drive the number of authors per entry, while the time the users spend on the knowledge base can drive the number of contributions per author.



Giving Computational Science a Friendly Face
article Hemert, J.I. and Koetsier, J. @ 2009/10/01
Zero-In, 1(3), 2009, pages 12-13.
[ url ]

Anyone who is purchasing a flight using a web browser expects to be guided through this task: from choosing the possible routes, to finding suitable dates and times, and to paying with a credit card. Today, most researchers from any discipline will successfully use these web-based e-commerce systems to book flights to attend their conferences. When these researchers are then confronted with solving compute-intensive problems, they need not expect such elaborate web-based systems to enable their domain-specific tasks. Instead, they will have to deal with archaic command-line tools and in the best case they may have access to generic portals that mimic the technical complexity of the underlying infrastructure. These interfaces are expensive to use as they require much investment from the researchers in terms of training. Moreover, the laborious and intricate processes involved often lead to errors.



Rapid development of computational science portals
inproceedings Koetsier, J. and van Hemert, J.I. @ 2009/09/09, Edinburgh
Proceedings of the IWPLS09 International Workshop on Portals for Life Sciences.
[ url ]

Motivation: Scientific web portals are seen as the way forward to improve upon the slow uptake in use of utility computing infrastructure and high-performance computing facilities. Currently, two types of portals exist: general-purpose portals and domain-specific portals. The first type closely resembles the underlying technical infrastructure of compute-job submission systems, thereby providing little appeal to a wide range of domain specialists. The second type is tailored to the application specifications and their end-users' requirements. Unfortunately, the technical complexity in domain-specific portals makes these expensive and time-consuming to develop and maintain. Clearly, an alternative to these two approaches is required. Results: We introduce an approach, Rapid, that facilitates rapid development of portlets. Its main aim is to reduce the time from development to the deployment from several months to a few weeks. Moreover, it facilitates an easy way to share and maintain these portlets by domain specialist themselves. Both these advantages considerably reduce the cost of developing portal solutions for computational science applications. We highlight several scientific domains where our approach is used or was used successfully. Availability: Rapid is developed under an Open Source model and is available freely through a Gnu General Public license. Main releases, documentation, tutorials and examples are available at http://research.nesc.ac.uk/rapid/. The development of Rapid uses an open read-only CVS repository, which is complemented by a developer community site at http://forge.nesc.ac.uk/projects/jos/.



Portals for Life Sciences-a Brief Introduction
inproceedings Gesing, S. and Kohlbacher, O. and van Hemert, J.I. @ 2009/09/09
Proceedings of the IWPLS09 International Workshop on Portals for Life Sciences.
[ url ]

The topic ''`Portals for Life Sciences''' includes various research fields, on the one hand many different topics out of life sciences, e.g. mass spectrometry, on the other hand portal technologies and diffe- rent aspects of computer science, such as usability of user interfaces and security of systems. The main aspect about portals is to simplify the user's interaction with computational resources which are concer- ted to a supported application domain.



Proceedings of the IWPLS09 International Workshop on Portals for Life Sciences
proceedings Gesing, S. and van Hemert, J.I. @ 2009/09/09, Edinburgh, UK
Proceedings of the International Workshop on Portals for Life Sciences.
[ url ]

IWPLS'09 focuses on research contributions for portals and tools in the field of life sciences. It brings together scientists from the fields of life science, bioinformatics and computer science. Its aim is to become the international platform to exchange experience, formulate ideas, and catch up on technological advances in molecular and systems biology in the context of portals. All papers published in these proceedings were accepted through a peer-reviewing process. Each paper had a 30-minute presentation and each accepted abstract had a "lightning talk" of 10 minutes. We would like to thank the authors for their contributions and our Program Committee for the effort put into reviewing. Nine papers were selected out of the excellent submissions. We owe much gratitude to the local organisers, for without their hard work the workshop would not have been such a success. We acknowledge both the e-Science Institute in Edinburgh and the Scottish Bioinformatics Forum for their financial contributions.



Using architectural simulation models to aid the design of data intensive application
inproceedings Fernández, J. and Han, L. and Nu\~nez, A. and Carretero, J. and van Hemert, J.I. @ 2009/09/01
Advanced Engineering Computing and Applications in Sciences, pages In press.

Performance is an open issue in data intensive applications. Finding the best implementation and influential performance factors of hardware and software platforms for the data intensive applications requires trial and error. However, it is very difficult and costly to perform these trials in a real large-scale environment. In this paper we use a generic simulation framework SIMCAN to simulate hardware and software platforms of data intensive applications for investigating the influential performance factors, and thereby making decisions on the design of data intensive application architectures. We have employed a typical use case of a data mining application, in which the architecture has been proposed using a pipeline model. We have simulated various scenarios to investigate factors that affect the system performance to assist the architecture design and the simulation results provide useful information for this decision- making.



A distributed architecture for data mining and integration
inproceedings Atkinson, M.P. and van Hemert, J.I. and Han, L. and Hume, A. and Liew, C.S. @ 2009/06/07, New York, NY, USA
DADC '09: Proceedings of the second international workshop on Data-aware distributed computing, pages 11-20.

This paper presents the rationale for a new architecture to support a significant increase in the scale of data integration and data mining. It proposes the composition into one framework of (1) data mining and (2) data access and integration. We name the combined activity DMI. It supports enactment of DMI processes across heterogeneous and distributed data resources and data mining services. It posits that a useful division can be made between the facilities established to support the definition of DMI processes and the computational infrastructure provided to enact DMI processes. Communication between those two divisions is restricted to requests submitted to gateway services in a canonical DMI language. Larger-scale processes are enabled by incremental refinement of DMI-process definitions often by recomposition of lower-level definitions. Autonomous evolution of data resources and services is supported by types and descriptions which will support detection of inconsistencies and semi-automatic insertion of adaptations. These architectural ideas are being evaluated in a feasibility study that involves an application scenario and representatives of the community.



An E-infrastructure to Support Collaborative Embryo Research
inproceedings A. Barker and van Hemert, J.I. and R.A. Baldock and M.P. Atkinson @ 2009/05/22
Cluster Computing and the Grid, pages 520-525.

Within the context of the EU Design Study Developmental Gene Expression Map, we identify a set of challenges when facilitating collaborative research on early human embryo development. These challenges bring forth requirements, for which we have identified solutions and technology. We summarise our solutions and demonstrate how they integrate to form an e-infrastructure to support collaborative research in this area of developmental biology.



The Circulate Architecture: Avoiding Workflow Bottlenecks Caused By Centralised Orchestration
article Barker, A. and Weissman, J. and van Hemert, J.I. @ 2009/03/01
Cluster Computing, 12(2), 2009, pages 221-235.
[ url ]

As the number of services and the size of data involved in workflows increases, centralised orchestration techniques are reaching the limits of scalability. In the classic orchestration model, all data passes through a centralised engine, which results in unnecessary data transfer, wasted bandwidth and the engine to become a bottleneck to the execution of a workflow. This paper presents and evaluates the Circulate architecture which maintains the robustness and simplicity of centralised orchestration, but facilitates choreography by allowing services to exchange data directly with one another. Circulate could be realised within any existing workflow framework, in this paper, we focus on WS-Circulate, a Web services based implementation. Taking inspiration from the Montage workflow, a number of common workflow patterns (sequence, fan-in and fanout), input to output data size relationships and network configurations are identified and evaluated. The performance analysis concludes that a substantial reduction in communication overhead results in a 2-4 fold performance benefit across all patterns. An end-to-end pattern through the Montage workflow results in an 8 fold performance benefit and demonstrates how the advantage of using the Circulate architecture increases as the complexity of a workflow grows.



Towards a Virtual Fly Brain
article Armstrong, J.D. and van Hemert, J.I. @ 2009/03/01
Philosophical Transactions A, 367(1896), 2009, pages 2387-2397.
[ pdf | url ]

Models of the brain that simulate sensory input, behavioural output and information processing in a biologically plausible manner pose significant challenges to both Computer Science and Biology. Here we investigated strategies that could be used to create a model of the insect brain, specifically that of Drosophila melanogaster which is very widely used in laboratory research. The scale of the problem is an order of magnitude above the most complex of the current simulation projects and it is further constrained by the relative sparsity of available electrophysiological recordings from the fly nervous system. However, fly brain research at the anatomical and behavioural level offers some interesting opportunities that could be exploited to create a functional simulation. We propose to exploit these strengths of Drosophila CNS research to focus on a functional model that maps biologically plausible network architecture onto phenotypic data from neuronal inhibition and stimulation studies, leaving aside biophysical modelling of individual neuronal activity for future models until more data is available.