Cespe UnB

Editorial Office:
R. S. Oyarzabal

Technical Support:
D. H. Diaz
M. A. Gomez
W. Abrahão
G. Oliveira

Publisher by
Knobook Pub



Similarity-based workflow clustering

doi: 10.6062/jcis.2011.02.01.0029(Free PDF)


Vítor Silva, Fernando Chirigati, Kely Maia, Eduardo Ogasawara, Daniel de Oliveira, Vanessa Braganholo, Leonardo Murta and Marta Mattoso


Scientists have been using scientific workflow management systems (SWfMS) to support scientific experiments. However, SWfMS expect a modeled workflow to be represented on its workflow language to be executed. The scientist does not have an assistance or guidance to obtain a modeled workflow. Experiment lines, which are a novel approach to deal with these limitations, allow for the abstract representation and systematic composition of experiments. Since there are many scientific workflows already modeled and successfully executed, they can be used to leverage the construction of new abstract representations. These previous experiments can be helpful by identifying scientific workflow clusters that are generated according to similarity criteria. This paper proposes SimiFlow, which is an architecture for similarity-based comparison and clustering to build experiment lines following a bottom-up approach.


Scientific workflow, clustering, similarity.


[1] ALTINTAS I, BERKLEY C, JAEGER E, JONES M, LUDASCHER B & MOCK S. 2004. Kepler: an extensible system for design and execution of scientific workflows. In: SSDBM, p. 423-424, Greece.

[2] BIRSAN D. 2005. On plug-ins and extensible architectures. Queue, 3(2): 40-46.

[3] BUNKE H & SHEARER K. 1998. A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett., 19(3-4): 255-259.

[4] CALLAHAN SP, FREIRE J, SANTOS E, SCHEIDEGGER CE, SILVA CT & VO HT. 2006. VisTrails: visualization meets data management. In: Proc. SIGMOD 2006, p. 745-747, Chicago, Illinois, USA.

[5] CAVALCANTI MC, TARGINO R, BAIAO F, R ˜ OSSLE SC, BISCH š PM, PIRES PF, CAMPOS MLM & MATTOSO M. 2005. Managing structural genomic workflows using web services. Data & Knowledge Engineering, 53(1): 45-74.

[6] CORMEN TH, CLIFFORD S, LEISERSON CE, RIVEST RL & STEIN C. 2001. Introduction to Algorithms. MIT Press.

[7] DEELMAN E, GANNON D, SHIELDS M & TAYLOR I. 2009. Workflows and e-Science: An overview of workflow system features and capabilities. Future Generation Computer Systems, 25(5): 528-540.

[8] GAMMA E, HELM R, JOHNSON R & VLISSIDES JM. 1994. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional.

[9] GOBLE CA & ROURE DCD. 2007. myExperiment: social networking for workflow-using e-scientists. In: Proceedings of the 2nd workshop on Workflows in support of large-scale science, p. 1-2, Monterey, California, USA.

[10] GODERIS A, DE ROURE D, GOBLE C, BHAGAT J, CRUICKSHANK D, FISHER P, MICHAELIDES D & TANOH F. 2008. Discovering Scientific Workflows: The myExperiment Benchmarks, IEEE Transactions on Automation Science and Engineering.

[11] HAN J & KAMBER M. 2006. Data Mining: Concepts and Techniques. Morgan Kaufmann.

[12] JAIN AK, MURTY MN & FLYNN PJ. 1999. Data clustering: a review. ACM Comput. Surv., 31(3): 264-323.

[13] JUNG J & BAE J. 2006. Workflow Clustering Method Based on Process Similarity. Computational Science and Its Applications - ICCSA 2006, p. 379-389.

[14] MATTOSO M, WERNER C, TRAVASSOS GH, BRAGANHOLO V, MURTA L, OGASAWARA E, OLIVEIRA D, CRUZ SMSD & MARTINHO W. 2010. Towards Supporting the Life Cycle of Large Scale Scientific Experiments. IJBPIM, 5(1): 79-92.

[15] McPHILLIPS T, BOWERS S, ZINN D & LUDASCHER B. 2009. š Scientific workflow design for mere mortals. Future Generation Computer Systems, 25(5): 541-551 (Maio).

[16] OGASAWARA E, OLIVEIRA D, CHIRIGATI F, BARBOSA CE, ELIAS R, BRAGANHOLO V, COUTINHO A & MATTOSO M. 2009a. Exploring many task computing in scientific workflows. In: MTAGS 09, p. 1-10, Portland, Oregon.

[17] OGASAWARA E, PAULINO C, MURTA L, WERNER C & MATTOSO M. 2009b. Experiment Line: Software Reuse in Scientific Workflows. In: 21th SSDBM, p. 264-272, New Orleans, LA.

[18] OHST D, WELLE M & KELTER U. 2003. Differences between versions of UML diagrams. In: Proc. 9th ESEC, p. 227-236, Helsinki, Finland.

[19] OINN T, ADDIS M, FERRIS J, MARVIN D, SENGER M, GREEN- WOOD M, CARVER T, GLOVER K & POCOCK MR et al. 2004. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20: 3045-3054.

[20] OLIVEIRA D, OGASAWARA E, CHIRIGATI F, SILVA V, MURTA L & MATTOSO M. 2010. GExpLine: A Tool for Supporting Experiment Composition. In: 3rd International Provenance and Annotation Workshop - IPAW, Troy, NY, USA.

[21] SANTOS E, LINS L, AHRENS JP, FREIRE J & SILVA CT. 2008. A First Study on Clustering Collections of Workflow Graphs. Proc. IPAW 2008, Springer-Verlag, p. 160-173.

[22] SIDIFF. 2010. SiDiff, http://www.sidiff.org.

[23] SILVA V, CHIRIGATI F, MAIA K, OGASAWARA E, OLIVEIRA D, BRAGANHOLO V, MURTA L & MATTOSO M. 2010. SimiFlow: Uma Arquitetura para Agrupamento de Workflows por Similaridade. In: IV e-Science, Belo Horizonte, Minas Gerais, Brazil.

[24] UHRIG S. 2008. Matching class diagrams: with estimated costs towards the exact solution? In: Proc. 2008 CVSM, p. 7-12, Leipzig.


Combining wavelets and linear spectral mixture model for MODIS satellite sensor time-series analysis
doi: 10.6062/jcis.2008.01.01.0005
Freitas and Shimabukuro(Free PDF)

Riddled basins in complex physical and biological systems
doi: 10.6062/jcis.2009.01.02.0009
Viana et al.(Free PDF)

Use of ordinary Kriging algorithm and wavelet analysis to understanding the turbidity behavior in an Amazon floodplain
doi: 10.6062/jcis.2008.01.01.0006
Alcantara.(Free PDF)

A new multi-particle collision algorithm for optimization in a high performance environment
doi: 10.6062/jcis.2008.01.01.0001
Luz et al.((Free PDF)

Reviewer Guidelines
(Under Construction)
Advertises Media Information