Elisângela S.C. Rodrigues, Fabrício A. Rodrigues, Ricardo L.A. da Rocha and Pedro L.P. Corrêa
Environmental issues are calling the attention of people all over the world, mainly in Brazil, which has one of the richest fauna and flora on Earth. Modeling of species geographical distribution is a technique that has been applied in many tasks related to biodiversity conservation. One of the problems of modeling species geographical distribution is to select an adequate set of environmental layers. A frequency distribution of each environmental layer can be represented by a histogram and the cut points of the histograms can be viewed as models. One of the classical problems in selecting a model is overfitting, that is, the super adjustment of the model to the observed data. The Minimum Description Length (MDL) principle has the property of avoiding overfitting when learning the parameters of the model. Thus, this is a promising strategy to be applied in the selection of any kind of model. The MDL principle searches for a model with the shortest description based on the observed data. This is done by finding regularities in data that are used to compress them. This principle was already successfully applied to probability density estimation by regular histograms. Nevertheless, there is a waste in the model representation when the data is non-uniformly distributed because of the high bin count needed to represent the details of high density data. Thus, the aim is to present how the MDL principle with irregular histograms can be used to select a good set of environmental layers. This strategy prevents the waste when representing parts of the data with low density.
applied computing in space and environmental sciences, scientific computing in multidisciplinary topic, Niche-based modeling, Minimum Description Length principle.
 GRAHAM CH, FERRIER S, HUETTMAN F, MORITZ C & PETERSON AT. 2004. New developments in museum-based informatics and applications in biodiversity analysis, TRENDS in Ecology and Evolution, 19(9): 497-503.
 RODRIGUES FA, AVILLA AO, RODRIGUES ESC, CORRÊA PLP, SARAIVA AM & ROCHA, RLA, 2009. Species distribution modeling with neural networks. e-Biosphere 2009, London - UK.
 LORENA AC, SIQUEIRA MF, GIOVANNI R, CARVALHO ACPLF & PRATI RC. 2008. Potential Distribution Modelling Using Machine Learning Classifiers. In: The Twenty First International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems, 2008, Wroclaw. Lecture Notes in Artificial Intelligence, 5027: 255-264.
 PERSONA L, CORRÊA PLP & SARAIVA AM. 2003. Environmental Niche Modeling in Biodiversity with Genetic Algorithms. In: 2nd International Information and Telecommunication Technologies Symposium, Florianópolis. Proceedings of the IEEE - I2TS'2003. (In Portuguese).
 PHILLIPS SJ, ANDERSON RP & SCHAPIRE RE. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190: 231-259.
 GRÜNWALD PD. 2005. Introducing the Minimum Description Length Principle. Advances in Minimum Description Length - Theory and Applications. The MIT Press, 2005, pp. 3-21.
 KONTKANEN P & MYLLYMÄKI P. 2007. MDL histogram density estimation. In: M. Meila and S. Shen, editors, Proceedings of the Eleventh International Workshop on Artificial Intelligence and Statistics.
 CHAPEAU-BLONDEAU F & ROUSSEAU D. 2009. The minimum description length principle for probability density estimation by regular histograms. Physica A, 388: 3969-3984.
 HUTCHINSON GE. 1981. Introduction to Ecology of Populations.Barcelona, Editorial Blume, 492p. (In Spanish).
 SIQUEIRA MF. 2005. Use of Fundamental Niche Modeling in the Pattern Evaluation of Vegetal Species Geographic Distribution.PhD Thesis. Department of Ambient Engineering of University of São Carlos. São Carlos/SP - Brazil. (In Portuguese).
 SANTANA F, SIQUEIRA MF, SARAIVA AM & CORRÊA PLP. 2008. A reference business process for ecological niche modelling. Ecological Informatics 3: 75-86.
 LI M & VITŽANYI P. 1997. An Introduction to Kolmogorov Complexity and its Applications, Springer, Berlin.
 RISSANEN J. 1986. Stochastic Complexity and Modeling. Annals of Statistics, 14(3): 1080-1100.
 MUÑOZ MES, GIOVANNI R, SIQUEIRA MF, SUTTON T, BREWER P, PEREIRA RS, CANHOS DAL & CANHOS VP. 2009. openModeller: a generic approach to species' potential distribution modeling. GeoInformatica.
 HIJMANS RJ, CAMERON SE, PARRA JL, JONES PG & JARVIS A. 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25: 1965-1978.
 RODRIGUES FA, RODRIGUES ESC, SATO LM, MIDORIKAWA ET, CORRÊA PLP & SARAIVA AM. 2008. Parallelization of the Jackknife Algorithm Applied to a Biodiversity Modeling System. Proceedings of the 7th International Information and Telecommunication Technologies Symposium - I2TS'2008, pp. 58-65.