Introduction

Welcome to the VAT Documentation

The VAT documentation is the first point of contact for VAT-related questions. Therefore, this documentation first covers the basics of the system in the Background section and then the functionality with examples in the User Guide section. This documentation also provides relevant links to find the resources to answer your questions.

vat-introduction

If you are curious or want to try the VAT yourself, you can follow this link: https://vat.gfbio.org/

The VAT System

Service	Status
vat.gfbio.org	Productive

Overview

The VAT system allows users to visualize geospatial data on a map in their browser and work with it interactively. Data and processing is provided by the Geo Engine backend service running in the Semantic Layer. The VAT system lists all data products available at the Geo Engine backend service as layers, which can be selected for visualization on the map. Layers can be combined and transformed interactively by constructing arbitrarily complex workflows, which themselves are visualized as new layers on the map or can be plotted, e.g., as a bar chart, right next to the map. This facilitates an interactive approach to constructing new data products and analyzing them.

VAT Screenshot

RDC Integration

The VAT system is served as a fully containerized application from the de.NBI cloud and is connected to the Geo Engine Backend service running in the Semantic Layer.

RDC Integration

Getting started

To become familiar with the VAT system, take a look at the publicly accessible instance, which has several biodiversity research related datasets available.

You can run through the following example to get a first impression on what is possible with the VAT system. In the example, you will take elephant occurrence datasets of two distinct species and combine them with a vegetation index dataset to visualize the difference in their habitats.

Go to vat.gfbio.org. Click on Add Data (+) -> Layers -> Elephant example. There you can find three layers: Loxodonta africana, Loxodonta cyclotis and MOD13C2 NDVI. The first two are point datasets of occurrences of two elephant species. NDVI is a vegetation index raster dataset.
Add all three layers to the map by clicking on them once. (Optional: Remove the Loxodonta cyclotis occurrences outside of Africa by first clicking on Add Data -> Draw Features, set type to "Polygon" and draw a polygon around Africa by clicking on the map. Then, select Operators -> Point In Polygon and select the Loxodonta cyclotis point layer and the drawn polygon. Apply the operator.)
Click on Operators -> Raster Vector Join to configure a raster vector join operator. The raster vector join operator attaches raster values to points.
Select as point input one of the two elephant occurrence datasets and as raster input the NDVI dataset. Give a descriptive name like " with NDVI". Click on "Create" to add the new layer to the map. Repeat for the second elephant occurrence dataset.
Click on Operators -> Histogram. Set as input one of the two new layers created by the raster vector join. Select the "MOD13C2 NDVI" attribute. Click "Create". Repeat for the other layer.
Now compare the two histograms you created. You should clearly see that the forest elephant occurs more often in more densely vegetated areas than the bush elephant (as expected). You can also move around/zoom in/out on the map to compare the two histograms for different regions.

User Guide

The VAT system aims at being as intuitive as possible. Whenever a deeper understanding is required though, e.g., about the specific settings of operators, links to the documentation are provided where those are explained in depth.

References

Authmann, C., Beilschmidt, C., Drönner, J., Mattig, M., & Seeger, B. (2015). VAT: a system for visualizing, analyzing and transforming spatial data in science. Datenbank-Spektrum, 15, 175-184.
Beilschmidt, C., Drönner, J., Mattig, M., & Seeger, B. (2023). Geo Engine: Workflow-driven Geospatial Portals for Data Science. Datenbank-Spektrum, 1-9.

The Geo Engine

Geo Engine

What is the Geo Engine?

Geodata, i.e. data relating to location and time, is omnipresent. The amount of data is constantly increasing. Geodata portals play a key role in the dissemination and utilization of geodata. They typically run in the cloud and users only need a browser to be able to use them. Although portals are sometimes highly specialized, there are requirements for the underlying software that are common to all portals. Data access, data processing and visualization must always be implemented. The Geo Engine provides all the components required to build geodata portals. It consists of a back end for processing and a front end with components that can be freely combined in portals.

The Geo Engine is also a geographic information system (GIS) that makes it possible to process data. Experts can use it to create workflows that generate a result from source data and processing steps. One example is linking animal observations to a temperature layer and filtering by average temperature to find animals that can cope well with the cold. Once an interesting workflow has been found, a portal can be created that can be used intuitively without prior knowledge. The Geo Engine portals go far beyond static maps. They enable interactive analyses so that the data can be freely explored. Users can also contribute their own data and merge it with the portal data after uploading it. For example, a user can upload GPS positions of a route to the portal and visualize the development of portal data along this route.

What are its components?

The Geo Engine consists of a backend, which usually runs on a server and provides data and functions for various frontends. The two front ends that belong to the Geo Engine are the web UI and the Python library. In addition, external tools can also communicate directly with the backend via standard interfaces.

The web UI enables the Geo Engine to be used in the browser. The elements of the Web UI can be combined to create various applications. The Geo Engine GIS offers the greatest flexibility, but requires a training period and specialist knowledge due to the wide range of functions. Dashboards, on the other hand, are aimed at a broader user group. They are specialized portals that focus strongly on one application and are easier to use due to predefined analyses. The Geo Engine comes with ready-made dashboards and allows you to build new dashboards from existing components.

The Python library is aimed at users with programming skills who want to process data outside of the Geo Engine. For example, it is possible to create more complex diagrams or use machine learning. In addition, the Geo Engine can also be administered via Python by activating further functionalities for data and user administration via an admin token.

The Geo Engine is based on standard software. The backend uses GDAL, PROJ and Apache Arrow, among others. The front end is based on Angular and OpenLayers. Docker containers are available for the installation and operation of the Geo Engine. There is one container image each for the backend and frontend. Together with external components such as a PostgreSQL database, these can be bundled in a pod and provided as a separate instance.

Can everyone install the Geo Engine?

The Geo Engine can be used at a very low threshold, as there are publicly accessible instances that run in the cloud and do not require installation. Examples include the GFBio VAT system at https://vat.gfbio.org and the EBV Analyzer at https://portal.geobon.org/map. In addition to these portals, which are based on the Geo Engine and offer more or less functions, there will also be a demo of the Geo Engine GIS in the future, which will be available at https://www.geoengine.io.

The Geo Engine can also be installed on your own systems and hosted yourself. It is then provided via Docker and requires certain IT expertise. Geo Engine GmbH also offers hosting and support on request.

The Geo Engine is made available under an open core license. This is a mixture of open-source and freely usable software with certain paid functions. All essential functions are available free of charge.

How does the Geo Engine differ from similar products such as MapServer, GeoServer or GeoNode?

In the world of geodata processing, there is a huge amount of software with very different focuses. MapServer and GeoServer are server software that provide geodata via web services for maps. GeoNode is a data management platform that is based on GeoServer, among other things. It enables users to create, share and publish interactive maps. The Geo Engine goes far beyond this functionality and makes it possible to create analyses in the platform itself using an operator toolbox and workflow engine. Based on these workflows, specialized dashboards and portals can then be created that are easy for users to operate.

How is the Geo Engine used in NFDI4Biodiversity?

NFDI4Biodiversity contains a great deal of geodata, i.e. data that has a spatial and temporal reference. One example is the locations of collections in a herbarium, which can have a time of discovery and GPS coordinates. It is important for the scientific community to be able to find and use this data as easily as possible. The Geo Engine can be seen as a toolbox for creating geo-applications within the framework of NFDI4Biodiversity.

In detail, there are two points of contact in NFDI4Biodiversity, namely the GFBio portal and user portals. GFBio is a sub-project that brings together data from German collections and data centers in the field of biodiversity and offers a point of contact for researchers. The Geo Engine can be accessed via the GFBio search, from which selected data can be visualized in a web GIS in the browser, where GIS stands for Geographical Information System. In addition, the Geo Engine can be used to perform GIS operations directly on the data without having to have expert knowledge or install software. One example is the linking of environmental data, e.g. temperature models, which are offered by the Geo Engine in addition to the GFBio data, with plant locations. The actually complicated work of linking two time series of different geodata is automatically taken over by the Geo Engine. The data can in turn be visualized using maps, tables or plots or downloaded for further use.

In addition to the GFBio portal, there is also a proof-of-concept in which data portals based on the Geo Engine and some special data sets from NFDI4Biodiversity were created for specific specialist communities. Here, dashboards were built on the basis of the Geo Engine that are precisely tailored to the needs of individual user groups. These then offer selected functions with intuitive, coordinated usability.

Where else is the Geo Engine used?

The Geo Engine is used in very different scenarios. In the area of data portals, it implements the connection, visualization and analysis of geodata. Specifically, it is the technological basis of the Terranova portal, which is building a digital atlas of Europe. In the GEO BON EBV Data Portal, it enables the exploration of and access to Essential Biodiversity Variables, which provide indicators for the development of global biodiversity.

In research, the Geo Engine is used to connect complex data sets, implement special algorithms and implement analysis workflows. It is used in the RESPECT project, which is investigating environmental changes in tropical mountain forests in southern Ecuador. In CropHype, it provides the basis for improving the classification of agricultural fields using new types of satellite data.

One use case from industry is the enrichment of proprietary data with publicly available data that is difficult to obtain and process. A concrete example is the calculation of vegetation indicators, a measure of how densely overgrown an area is. Here, the Geo Engine procures the necessary satellite data, calculates the vegetation and links it to the company data. The results are made available via standard interfaces so that they can be integrated into company processes.

More information

FAQ

Here, we answer frequently asked questions about the VAT System.

The VAT System is built on top of the Geo Engine. The Geo Engine is a powerful geospatial processing engine that provides a wide range of geospatial processing capabilities. The VAT System is a user-friendly interface that uses the Geo Engine, designed to make it easy for users to access and use geo data from NFDI4Biodiversity.

Get in touch

VAT is developed by the Database Research Group of the University of Marburg (head: Prof. Bernhard Seeger). The design of VAT was a joint collaboration with the Senckenberg Biodiversity and Climate Research Centre (BiK-F) (head: Prof. Thomas Hickler).

VAT ist hosted and operated by GFBio - Gesellschaft für Biologische Daten e.V. (Imprint).

VAT is built upon the Geo Engine, a cloud-ready geo-spatial data processing platform. Learn more about Geo Engine on GitHub or visit the Geo Engine website.

Contact

If you have any questions or feedback, please feel free to contact us.

Resources

Important Features

Here, we describe the most important features of the VAT system and the Geo Engine.

Operator Toolbox

VAT utilizes the Geo Engine Operator Toolbox to provide a wide range of geospatial processing capabilities. The Operator Toolbox is a powerful tool for processing and analyzing geospatial data. It provides a wide range of operators for processing raster and vector data, such as filtering, combining, and aggregating. The Operator Toolbox is designed to be user-friendly and intuitive, allowing users to easily create complex processing chains. The Operator Toolbox is also extensible, allowing users to create custom expressions to meet their specific needs.

More resources

Geo Engine Operator Toolbox Documentation

Python Library

The Geo Engine Python Library allows users to interact programmatically with a Geo Engine backend, for instance the one offered in the Semantic Layer.It allows the management of a Geo Engine instance for administrators, for example to assign roles. Users can manage their datasets, layers and workflows and load data products into Python for further processing and analysis tasks. Having data products from the Geo Engine easily available directly in Python facilitates their use in external tools users are already working with. For example, the Geo Engine Python Library can be used in Jupyter Notebooks to construct and retrieve data products from a Geo Engine backend, taking advantage of Geo Engine's powerful geospatial processing capabilities. Then, with a data product loaded into Python, any suitable visualization tool can be used within the notebook. Furthermore, when connected to the same Geo Engine backend, a user can seemlessly switch between the Geo Engine web front end (VAT) and the Geo Engine Python Library, choosing the tool best suited for the task at hand any time.

Getting started

To become familiar with the Geo Engine Python Library, take a look at the examples in the GitHub repository. You can connect to the Geo Engine backend running in the Semantic Layer.

User Guide

In addition to the examples, which offer a good starting point, there also exists documentation for all available functionality.

Developer Guide

The source code of the Geo Engine Python Library is publicly available on GitHub.

Search Integration

The VAT system provides a search integration with the GFBio search that allows users to transfer search results directly to the VAT system.

Searching for data

To search for data, users can enter a search term in the search bar and press the Enter key. This will show the search results. Users can filter the search results by selecting Visualizable in VAT in the menu on the left side. This will show only the datasets that can be visualized in the VAT system.

Search baskets

Users can add search results to the basket by clicking on the Basket button. This will add the data to the basket.

Transferring data to VAT

Users can transfer the data from the basket to the VAT system by opening the search basket. This will show all datasets that are in the basket. Users can select the datasets they want to transfer to the VAT system and click on the Visualize in VAT button.

Adding layers to the map

This will open a dialog in the VAT system where users can select the layers they want to add to the map. Users can choose the layers they want to add and if they should replace the current layers or add the new layers on top of the current layers.

The selected layers will be added to the map in the VAT system. Users can now work with the data as they would with any other data in the VAT system.

ABCD Archive Connection

GFBio's connected Data Centers provide access to a variety of data archives. The VAT system allows users to access these archives directly. This makes it easier to access data without having to download it themselves. In addition, users can map the data together with other data sources in the VAT system.

In the background, the VAT system harvests all ABCD data from the GFBio Search Index every night. Thus, updates to the ABCD data are available in the VAT system the next day.

Finding the archives

To find the ABCD data, users can click on the + button in the data menu. This opens a dialog where users can select the GFBio ABCD Datasets menu item. This will show all ABCD datasets that are available in the VAT system.

Selecting data

Users can select the data they are interested in by clicking on the dataset. This will load all occurrences from the selected dataset into the VAT system as a new layer.

The data is displayed on the map as clustered points and can then be used like any other data in the VAT system. Zooming in will dissolve the clusters and show the individual occurrences. Users can also open the data table to see more attributes of the occurrences.

Multimedia items

Some ABCD datasets contain links to multimedia items. These can be images, videos, or audio files. Users can click on the multimedia item in the data table to open it in a new dialog. For instance, when the item is an image, it will be displayed directly in the VAT system.

The data table will show at most three links for clustered occurrences. As a users, you can zoom in to see more items.

Citing the data

To cite the data, users can click on the Show Provenance icon in the context menu. This will open a table that shows the citation for the data.

The table has three columns: Citation, License, and URI. The Citation column contains the citation for the data. The License column contains the license under which the data is available. The URI column contains the URI to the license file.

GBIF Data & Search

Select GBIF

The VAT system provides access to a snapshot of the GBIF occurrence data. This allows users to easily access GBIF data without having to download the data themselves. In addition, they can map the data together with other data sources in the VAT system.

The GBIF snapshot contains all occurrences available at gbif.org at the time of the snapshot. The snapshot date is noted in the GBIF data provider description (see Add Data dialog above). Since data in VAT is spatio-temporal, we filter the occurrences by three conditions:

They have a coordinate
They have no geospatial issues
They have an event time

All occurrences fulfilling these three conditions are imported into a PostgreSQL database and indexed by time and space (using the PostGIS extension), as well as family, genus and species names. To enable browsing along the taxonomic hierarchy, we additionally import GBIF's backbone taxonomy. We also retrieve the citations for all datasets through the registry API endpoint to be able to compile them according to GBIF's citation guidelines for a set of filtered occurrences.

GBIF groups

The GBIF data are made available as a data provider, which can be selected in the data menu (+). There, VAT groups the GBIF occurrences by different taxonomic ranks, e.g. family or species. Selecting such data will load all occurrence records from different datasets that fall under this taxonomic rank, e.g. all occurrences of the Genus Abedus (water bug).

While users can browse lists of taxonomic ranks, they can also search for specific taxa. This makes it much easier to find the data of interest when specific taxa are known. At the top of the GBIF catalog, users can find the search icon on the right-hand side. Clicking on it brings up a search bar where users can enter the name of the taxon of interest. By typing in a few letters of the taxon name, VAT will suggest possible names that can be selected if they seem appropriate. Clicking the search icon again, or pressing ENTER, will display a list of search results.

Users can also change the default search settings by clicking the options icon next to the search icon. This opens a dialog that allows users to change the default search settings, such as the search type. The Fulltext search matches the term anywhere in the name, while the Prefix search matches only the beginning of the name. In addition, users can filter their results by taxonomic rank, e.g., show only results that are of the rank Species. This can be done by first selecting one of the collections, e.g., Species datasets, and then performing the search.

At all browsing levels, the current filter, if selected, is respected also during a search. This is an additional upgrade to a previous version of the search where you could only search for family, genus, or species. Now you can, for example, filter for a specific kingdom and order beforehand and already reduce the amount of search results by combining hierarchical browsing and searching.

Video Tutorials

This chapter contains video tutorials on how to use the VAT system. These tutorials are designed to help users get started with the VAT system and to demonstrate how to use its features.

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

Introduction to VAT

Video

Summary

Welcome to the Introduction to VAT.

This first tutorial will introduce you to the VAT system, which can be used to easily load, transform and explore spatio-temporal datasets, such as in the context of ecological science. This tutorial will give you a tour, explain each menu and show the functionality in a simple first use case where we spatially join the minimum and maximum temperature with the GBIF occurrence data of Aeshna affinis.

Let the tour begin!

The most prominent area when opening the link https://vat.gfbio.org is the large map. Here you can visualise the spatio-temporal data. The extent of the map can be changed by dragging with the mouse or zooming with the scroll wheel.

Next, in the top left-hand corner, is the layer selection menu, which allows you to view all the layers currently loaded, change the symbology or arrange the layers. You can also view the provenance, data table or download the layer.

In the top left hand corner you will find the GFBio Portal button which will take you back to the GFBio Search. Due to the deep integration between the VAT and the GFBio search, it is possible to load data directly from the GFBio search. This button will take you back to the GFBio search when you have finished your data exploration.

Next to the GFBio button is a zoom manipulation menu. Next to the scroll wheel, the zoom level can be changed using the maximise and minimise buttons.

In the middle of the top bar is the time step selector. When viewing spatio-temporal data, you may wish to change the time by one time step. This menu can be used to move the current time step or to open the time selector, which we will see in a moment.

On the top bar you will find a series of icons, which we will visit next.

The first icon is the Account menu. Here you can log in with your GFBio account, which allows you to upload files or create, save or export a project. It also shows the session token, which can be used in Python to visit loaded files.

The next menu is the data selection menu. Here you will find several data catalogues. The Data Catalogue contains datasets hosted by the GeoEngine, such as land use classification, climate information or orographic elevation maps. The Personal Catalogue contains all files and workflows, and the All Datasets Catalogue contains all hosted and uploaded datasets. Below these are the GBIF and GFBio ABCD data catalogues, which contain all datasets derived from the respective data providers. It is also possible to draw features or load a layer by inserting the workflow_id from a Python workflow.

The cogwheel icon hides the operator selection menu. Here you will find a range of operators to manipulate, transform, merge or plot vector or raster data.

The plots are then displayed in the Plot Window. Here you can view the plot results and delete plots.

The next menu is the time configuration menu. Here you can filter the spatio-temporal data. It is also possible to change the time step using the time step selector.

If you are logged in, the workspace settings allow you to save and load projects and change the spatial reference of your project.

The last menu is the Help section. Here you will find initial information and links to the geoengine documentation, as well as further information about the VAT.

After this brief tour, let us start with an example workflow to demonstrate the capabilities of the VAT.

First we go to the data selection menu and search for Aeshna affinis in the GBIF data catalogue. Clicking on the file loads the layer into the map.

Note: GBIF data are spatio-temporal data. As the events occur at a specific point in time, it is recommended to select an appropriate time period.

Therefore, we need to change the time configuration in the respective menu by selecting a different end date, while we can also change the size of the time steps according to the selected time period.

Minimum temperature spatiotemporal extent

As spatio-temporal raster data, we add the CHELSA climatologies, which represent the average of the last 30-year climate period.

So we load the Mean daily air temperature because the temperature is related to the distribution of Aeshna affinis.

In the layers menu we can now rearrange the layers and even change the symbology if we wish.

For once, we can change the clustering and the symbology of vector data.

We can also change the representation of the vector data by clicking on Edit Symbology. Here we scroll down, select another color map such as VIRIDIS or MAGMA and click on Create Color Map. Finally, we confirm the change with the Apply button at the bottom of the menu.

After loading the data, we want to spatially join the occurrence data of Aeshna affinis with the Mean daily air temperature datasets using the raster vector join operator. For better readability, it is recommended to name the datasets.

Data table Aeshna affinis oekosystematlas

The result is that the vector data is spatially linked to the raster data by position. Therefore, new columns are added to the vector data table containing the information.

The Histogram operator can be used to visualise the distribution of occurrence data as a function of temperature.

The graph then show the distribution of occurrences of Aeshna affinis as a function of the average temperatures in relation to the respective temporal raster.

When you are finished manipulating the data, you can download the raster data as a .tif file and the vector data as a .shp file from the layer selection menu.

In the menu it is also possible to display the origin, which will then appear in the data table area at the bottom of the VAT.

This was the first introductory tour of the VAT system. If you want to learn more, you can do so by watching the videos or exploring the use cases in this documentation.

Warning The VAT system is designed primarily for data exploration. Changing the extent of the visual map will recalculate the workflow and may change the results! This must be taken into account when working scientifically with the VAT system. There is also a new window in the bottom left corner. This window must be present when working scientifically with the VAT system, as it allows reproducibility!

Tip: The layers have several options. They can be downloaded to work with the data in other systems. The layers also always have a workflow tree and the workflow_id can be copied to import the workflow directly into Python.

Canis lupus meets Felis silvestris

Video

Summary

Welcome to the Canis lupus meets Felis silvestris use case.

In this example the GBIF occurrence data of Canis lupus and Felis silvestris are filtered by the extent of Germany and joined to the land use classification of the IÖR land use classification.

To begin, we select the Data Catalogue in the top right-hand corner. Here we have several data catalogues to choose from.

In our case, we start by searching for the individual species in the GBIF data provider. The search function makes it easy to find the species, so we search for Canis lupus and load the dataset by selecting it.

For the spatial selection we also need the German borders, which we found by searching for Germany in the data catalogue.

In order to join the occurrence data with the land use classification, it is also necessary to load the IÖR Landschaftsklassifikation by searching for it in the data catalogue.

The next step takes place in the Operators section, located in the top right-hand corner.

First we use a Point in Polygon Filter to restrict our occurrence data to Germany. For better readability it is recommended to name the datasets.

Next, we join the raster data to the vector data using the Raster Vector Join Operator, which takes the occurrence data as a vector and the IÖR Landschaftsklassifikation as raster data.

The result is that the vector data is spatially joined to the raster data by position. Therefore, a new column is added to the vector data table containing the information from the raster/the raster value. The float values are the result of the clustering calculating the mean of all integer encoded classes in classified data. This won't reflect in the downloaded data.

To visualise the classified data, it is recommended to use the Class Histogram operator, which translates the IÖR Landschaftsklassifiation numbers into class names using the metadata.

The graph then shows the distribution of occurrences according to class.

Using the same procedure for Felis silvestris, it is possible to compare the occurrence of the two species.

Overview Canis lupus Felis sivestris final

Warning: The VAT system is mainly used for data exploration. Changing the extent of the visual map will recalculate the workflow and could change the results! This must be taken into account when working scientifically with the VAT system. There is also a new window in the bottom left corner. This window must be present when working scientifically with the VAT system, as it allows reproducibility!

On Dry Land

Video

Summary

Welcome to the Dry Land Use Case.

In this example, the GBIF occurrence data of Calopteryx splendens are clipped to the extent of Germany and merged with the land use classification from the IÖR land use classification as well as aggregated temperature data provided by the German weather service (DWD).

To begin, we select the Data Catalogue in the top right-hand corner. Here we have several data catalogues to choose from.

In our case we start by searching for Calopteryx splendens in the GBIF data provider. The search function makes it easy to find the species, so we can search for Calopteryx splendens and load the dataset by selecting it.

For the spatial selection we also need the German border, which we found by searching for Germany in the data catalogue.

Next, for the join between the occurrence data and the average temperature, we search for the Daily mean air temperature dataset in the data catalogue.

Caution: The Daily mean air temperature is a spatio-temporal dataset. Always check the spatial and temporal extent in the metadata.

Average temperature spatiotemporal extent

The Daily mean air temperature dataset covers Germany with a time range from 1951/01/01 to 2021/01/01 in a daily resolution. To see an effect on the data it is recommended to change the time in the time menu at the top right due to the high spatiotemporal resolution of the temperature data.

In order to join the occurrence data with the land use classification, it is also necessary to load the IÖR Landschaftsklassifikation by searching for it in the data catalogue.

The next step takes place in the Operators section, located in the top right-hand corner.

First we use a Point in Polygon Filter to restrict our occurrence data to Germany. For better readability it is recommended to name the datasets.

As the Daily mean air temperature has such a high temporal resolution we want to convert it into a more handleable and more interpretable monthly time resolution using the Temporal Raster Aggregation Operator.

Next, we join the raster data to the vector data using the Raster Vector Join Operator, which takes the occurrence data as a vector and the IÖR Landschaftsklassifikation and Monthly mean air temperature as raster data.

The result is that the vector data is spatially joined to the raster data by position. Therefore, new columns are added to the vector data table containing the information.

Data table Calopteryx splendens oekosystematlas

The Histogram operator can be used to visualize the distribution of occurrence data as a function of average temperature.

To visualize the classified data, it is recommended to use the Class Histogram operator, which translates the IÖR land classification numbers into class names using the metadata.

The plots then show the distribution of occurrences of Calopteryx splendens as a function, firstly, of the average temperature on 1 January 2000 and, secondly, of the land-use classification of the IÖR Landklassifikation. Now the time selection menu at the top can be used to observe the change between the observation and the temperature and land use classification over the course of the year. Please note that the land classification does not change over time, only the species occurrences.

Warning: The VAT system is designed primarily for data exploration. Changing the extent of the visual map will recalculate the workflow and could change the results! This must be taken into account when working scientifically with the VAT system. There is also a new window in the bottom left corner. This window must be present when working scientifically with the VAT system, as it allows reproducibility!

VAT 4 ML - Creating Training data for a species distribution model

This workflow is a contribution to the NFDI4Earth conference.

Video

The video for this use case is coming soon!

Summary

Welcome to the VAT 4 ML Use Case.

In this example we will label training data in VAT for Germany, transfer it to a Jupyter notebook using the unique workflow identifier, download the training data as a geodataframe and finally use a machine learning model to build a species distribution model.

For this use case, we will therefore use the frequency of Arnica montana occurrences from GBIF as the target variable together with weather data from CHELSA, land use classification from Ökosystematlas and topographic information as predictor variables.

To begin, select the Data Catalogue in the top right-hand corner. Here we have several data catalogues to choose from.

In our case, we start by searching for the individual species in the GBIF data provider. The search function makes it easy to find the species, so we search for Arnica montana and load the dataset by selecting it.

Sidebar with data catalog. Currently the GBIF data provider is chosen and the search is opened with Arnica montana in the search field

For the weather data we taking weather information from CHELSA. Here we choose the Mean daily air temperature, Monthly moisture index and the Montly precipitation amount.

Sidebar with data catalog. Currently the data catalogue is opened with the CHELSA tab containing multiple weather layer

**Caution: The weather data is a spatio-temporal data set. Always check the spatial and temporal extent in the metadata.

The weather datasets cover the whole earth and a time range from 01/01/1981 to 01/01/2011. We need to change the time in the time menu at the top right.

Sidebar with time configuration menu, where the time can be set to address temporal boundaries of spatio-temporal data

For the spatial selection we also need the German borders, which we found by searching for Germany in the data catalogue.

Sidebar with data catalog. Currently the data catalog is selected and the search function is used to search for the German boundaries

To add topographic information to the predictor variables, we include the SRTM elevation model.

Sidebar with data catalog. Currently the data catalog is selected with the SRTM tab

Finally, we add land use classification data, which in this case is the Oekosystemaltas. It can be loaded by searching for it in the personal data catalogue. The personal data catalogue contains all the datasets that the user has uploaded, as well as a section with all datasets, which also contains datasets that are not listed.

Sidebar with data catalog. Currently Personal data catalogue is selected search function is used to find the Oekosystematlas

This gives us all the layers we need to create the training and prediction data.

An overview map is visible which contain all the added layers

We start to create the training data and prepare the prediction data by aggregating the spatio-temporal weather data. To do this, we use the Temporal Raster Aggregation operator. This allows us to aggregate temporal data by a moving window (i.e. 1 year). We use this operator for all weather data. While we choose the mean aggregation type for the temperature and the moisture index, we choose the sum aggregation type for the precipitation. For better readability it is recommended to name the datasets.

In a second step, we spatially filter the GBIF occurrence data of Arnica montana using the Point in Polygon Filter to restrict our occurrence data to Germany.

point in polygon operator .

Finally, to create the training data, we join the prepared raster data to the vector data using the Raster Vector Join Operator, which takes the occurrence data as a vector and the other prepared raster data. This allows us to spatially join the occurrences with the value of the underlying raster cells.

To create the prediction data, we then use the Raster Stacker operator to create a multi-layer raster containing all the raster data. This makes it easier to import it into Jupyter Notebook and work with it.

This brings us to the Arnica montana training data and the stacked prediction grid data.

Overview of the map with the training and prediction data as well as all other layer visually hidden

We now copy the Workflow ID for each layer to use in Jupyter Notebook.

Layer menu showing the options for i.e. copy the workflow id to clipboard

In Jupyter Notebook, we can use the geoengine package to first initialise the VAT API. We then import the training data workflow. We then round and group the data in Jupyter Notebook to create a frequency of Arnica montana occurrences for each predictor variable combination. The frequency is used as the target variable and the remaining columns are used for the predictor variables. We then split the dataset into training and test data and start training the RandomForestRegressor model using a GridSearchCV strategy for better results. The best resulting model has an r2 value of 0.07.

Jupyter Notebook code used to train the species distribution model

After model training we can import the prediction data workflow. The best RandomForestRegressor model is used for the final prediction.

Jupyter Notebook code used to predict using the trained species distribution model

Finally, the result is plotted using the matplotlib package.

Jupyter Notebook code used to plot the result of the prediction

The plot, showing three maps, two with the distribution and one with the distribution of the training data

Although the model did not show the best performance, it was possible to show how easy it is to create spatio-temporal training data for machine learning applications using the VAT and exporting the data directly into Python, where it can be used in typical formats such as geopandas GeoDataFrame or xarray DataArray.

Examples

This chapter contains examples of how to use the VAT system. The examples are written in Jupyter notebooks and are available in the examples directory. The notebooks are converted to markdown and included in the user documentation.

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

Introduction to VAT

Welcome to geoengine-python! This notebook is intended to show you around and explain the basics of how geoengine-python and VAT are related.

The purpose of this notebook is to demonstrate the capabilities of Geo Engine. Therefore, some useful techniques will be shown:

Introduction to the geoengine-python package
Loading a dataset
Using operators
Plotting the results
First simple nested workflows
The connection between Python and VAT

When building your own nested workflow, it is recommended to build it in several steps as seen in this notebook.

Documentation about the operators and how to use them in Python can be found here: https://docs.geoengine.io/operators/intro.html

Preparation

The first thing to do is to import the geoengine-python package:

import geoengine as ge

For plotting it is currently also necessary to import Altair:

import altair as alt

#Other imports
from datetime import datetime
import matplotlib.pyplot as plt

To establish a connection with the VAT, the ge.initialise can be used together with the API URL:

ge.initialize("https://vat.gfbio.org/api")

In the case of a locally hosted instance, the link would be http://localhost:4200/api.

For a more comfortable work with the GBIF DataProvider it is possible to get the name from the root_collection:

root_collection = ge.layer_collection()
gbif_prov_id = ''
for elem in root_collection.items:
    if elem.name == 'GBIF':
        gbif_prov_id = str(elem.provider_id)
        
gbif_prov_id

'1c01dbb9-e3ab-f9a2-06f5-228ba4b6bf7a'

To load data, use operators or plot vector data, 'workflows' need to be created, as shown in Loading the dragonfly species Aeshna affinis

Load Aeshna affinis from the GBIF DataProvider

A workflow needs to be registered in the VAT or Geo Engine instance. Therefore, the command ge.register_workflow followed by the command in JSON can be used:

workflow_aeshna_affinis = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": f"_:{gbif_prov_id}:`species/Aeshna affinis`",
        }
    }
})

workflow_aeshna_affinis

c7b6b25a-714d-58d1-9f53-db7bf4995a5b

Alternatively the workflow_builder can be used as shown here: TODO

The result of each registration is the workflow_id, which can be used directly in VAT to trigger the workflow. To finally load the vector data from VAT, the .get_dataframe method can be used. The method takes as parameters the search extent, a time interval, the spatial resolution and a coordinate reference system.

#Set time
start_time = datetime.strptime(
    '2010-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
end_time = datetime.strptime(
    '2011-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_aeshna_affinis.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
ax = data.plot(markersize=3)
ax.set_xlim([-180,180])
ax.set_ylim([-90,90])

(-90.0, 90.0)

The extent was chosen to make it clear that the occurrences of Aeshna affinis only occur on the Eurasian continent. Without the x- and y-limiters the plot would look different:

data.plot()

<Axes: >

In addition to vector data, raster data could also be loaded from the VAT.

Loading Minimum and Maximum temperature from the temperature collection

To load raster data again, a workflow must be registered, but this time the 'GdalSource' is used instead of the 'OgrSource':

workflow_t_min = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "RasterScaling",
        "params": {
            "slope": {
                "type": "constant",
                "value": 0.1
            },
            "offset": {
                "type": "constant",
                "value": -273.15
            },
            "outputMeasurement": {
                "type": "continuous",
                "measurement": "temperature",
                "unit": "K/10"
            },
            "scalingMode": "mulSlopeAddOffset"
        },
        "sources": {
            "raster": {
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "mean_daily_minimum_2m_air_temperature"
                        }
                    }
                }
            }
        }
    }
})

workflow_t_min

a57efb5a-7256-58b9-b9f2-9f22d9724bab

The raster data can then be requested as a xarray.DataArray and plotted that way:

#Request the data from Geo Engine into a xarray dataarray
data = workflow_t_min.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(1., 1.),
        srs="EPSG:4326"
    )
)

#Plot the data TODO more description
data.plot(vmin=-50, vmax=50)

/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa





<matplotlib.collections.QuadMesh at 0x7fb654cef9a0>

The same can be done for the maximum temperature:

workflow_t_max = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "RasterScaling",
        "params": {
            "slope": {
                "type": "constant",
                "value": 0.1
            },
            "offset": {
                "type": "constant",
                "value": -273.15
            },
            "outputMeasurement": {
                "type": "continuous",
                "measurement": "temperature",
                "unit": "K/10"
            },
            "scalingMode": "mulSlopeAddOffset"
        },
        "sources": {
            "raster": {
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "mean_daily_maximum_2m_air_temperature"
                        }
                    }
                }
            }
        }        
    }
})

workflow_t_max

cdfe579d-b451-5b7e-b98d-bf0570489784

#Request the data from Geo Engine into a xarray dataarray
data = workflow_t_max.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(1.0, 1.0),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmin=-50, vmax=50)

/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa





<matplotlib.collections.QuadMesh at 0x7fb652906ad0>

As well as loading data, the VAT has several operators for manipulating or transforming geodata. One example is the raster vector join.

Raster Vector Join between Aeshna affinis and the Minimum and Maximum Temperature

The raster vector join operator joins the vector data to one or more raster layers based on the position of the vector features. As shown in this example, the inputs are more or less the individual workflows seen before:

workflow_aeshna_affinis_join = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
            "names": {
                "type": "names",
                "values": ["Min_Temperature", "Max_Temperature"]
            },
            "temporalAggregation": "none",
            "featureAggregation": "mean",
        },
        "sources": {
            "vector": { #Aeshna affinis ##########################################
                "type": "OgrSource",
                "params": {
                    "data": f"_:{gbif_prov_id}:`species/Aeshna affinis`",
                }
            }, ###################################################################
            "rasters": [{ #Minimum temperature ###################################
                    "type": "RasterScaling",
                    "params": {
                        "slope": {
                            "type": "constant",
                            "value": 0.1
                        },
                        "offset": {
                            "type": "constant",
                            "value": -273.15
                        },
                        "outputMeasurement": {
                            "type": "continuous",
                            "measurement": "temperature",
                            "unit": "K/10"
                        },
                        "scalingMode": "mulSlopeAddOffset"
                    },
                    "sources": {
                        "raster": {
                            "type": "RasterTypeConversion",
                            "params": {
                                "outputDataType": "F32"
                            },
                            "sources": {
                                "raster": {
                                    "type": "GdalSource",
                                    "params": {
                                        "data": "mean_daily_minimum_2m_air_temperature"
                                    }
                                }
                            }
                        }
                    }
                }, ################################################################ 
                { #Maximum temperature ############################################
                    "type": "RasterScaling",
                    "params": {
                        "slope": {
                            "type": "constant",
                            "value": 0.1
                        },
                        "offset": {
                            "type": "constant",
                            "value": -273.15
                        },
                        "outputMeasurement": {
                            "type": "continuous",
                            "measurement": "temperature",
                            "unit": "K/10"
                        },
                        "scalingMode": "mulSlopeAddOffset"
                    },
                    "sources": {
                        "raster": {
                            "type": "RasterTypeConversion",
                            "params": {
                                "outputDataType": "F32"
                            },
                            "sources": {
                                "raster": {
                                    "type": "GdalSource",
                                    "params": {
                                        "data": "mean_daily_maximum_2m_air_temperature"
                                    }
                                }
                            }
                        }
                    }
                } #################################################################
            ]
        }
    }
})
    

workflow_aeshna_affinis_join

8b26f457-4d52-5f35-b10a-aca7352f47d1

The input parameters required for each operator can be found in the documentation: https://docs.geoengine.io/operators/intro.html. In this example, the RasterVectorJoin operator takes two input parameters: vector, which represents the vector layer to use, and raster, which represents the one or more raster layers to join.

The resulting vector data again can be retrieved by requesting the data as a GeoDataFrame:

#Request the data from Geo Engine into a geopandas dataframe
data_aeshna_affinis = workflow_aeshna_affinis_join.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the geopandas dataframe
data_aeshna_affinis

	geometry	Max_Temperature	Min_Temperature	basisofrecord	gbifid	scientificname	start	end
0	POINT (6.17690 52.27207)	13.250000	4.350006	HUMAN_OBSERVATION	699741184	Aeshna affinis Vander Linden, 1820	2010-04-28 00:00:00+00:00	2010-04-28 00:00:00+00:00
1	POINT (6.17690 52.27207)	13.250000	4.350006	HUMAN_OBSERVATION	699741183	Aeshna affinis Vander Linden, 1820	2010-04-28 00:00:00+00:00	2010-04-28 00:00:00+00:00
2	POINT (3.55448 43.39541)	18.550018	13.850006	HUMAN_OBSERVATION	3945130371	Aeshna affinis Vander Linden, 1820	2010-05-26 00:00:00+00:00	2010-05-26 00:00:00+00:00
3	POINT (3.76048 49.60182)	17.550018	8.750000	HUMAN_OBSERVATION	2485531094	Aeschna affinis Stephens, 1836	2010-05-25 00:00:00+00:00	2010-05-25 00:00:00+00:00
4	POINT (3.76048 49.60182)	17.550018	8.750000	HUMAN_OBSERVATION	2485629036	Aeschna affinis Stephens, 1836	2010-05-28 00:00:00+00:00	2010-05-28 00:00:00+00:00
...	...	...	...	...	...	...	...	...
973	POINT (5.94470 46.68733)	NaN	NaN	HUMAN_OBSERVATION	3480458996	Aeshna affinis Vander Linden, 1820	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00
974	POINT (2.73627 49.70645)	NaN	NaN	HUMAN_OBSERVATION	3845267165	Aeshna affinis Vander Linden, 1820	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00
975	POINT (2.62640 49.08975)	NaN	NaN	HUMAN_OBSERVATION	3072870148	Aeshna affinis Vander Linden, 1820	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00
976	POINT (3.79175 46.02396)	NaN	NaN	HUMAN_OBSERVATION	3072950291	Aeshna affinis Vander Linden, 1820	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00
977	POINT (6.04652 46.69048)	NaN	NaN	HUMAN_OBSERVATION	3073536260	Aeshna affinis Vander Linden, 1820	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00

978 rows × 8 columns

The data could then be plotted directly in Python:

fig, ax = plt.subplots(1, 2, figsize=(20,10))

data_aeshna_affinis.plot(ax=ax[0], column='Min_Temperature', legend=True, legend_kwds={'label': 'Minimum Temperature'})
data_aeshna_affinis.plot(ax=ax[1], column='Max_Temperature', legend=True, legend_kwds={'label': 'Maximum Temperature'})

plt.show()

The VAT also offers some of its own plot types, such as histograms.

Plotting Aeshna affinis Minimum and Maximum Temperature as Histograms using VAT

Of course, a workflow must be registered in order to plot the data:

workflow_aeshna_affinis_join_plot_min = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "Histogram",
       "params": {
          "attributeName": "Min_Temperature",
           "bounds": "data",
           "buckets": {
               "type": "number",
               "value": 20
           }
       },
        "sources": {
            "source": { #Aeshna affinis Join #############################################
                "type": "RasterVectorJoin",
                "params": {
                    "names": {
                        "type": "names",
                        "values": ["Min_Temperature", "Max_Temperature"]
                    },
                    "temporalAggregation": "none",
                    "featureAggregation": "mean",
                },
                "sources": {
                    "vector": { 
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Aeshna affinis`",
                        }
                    }, 
                    "rasters": [{
                            "type": "RasterScaling",
                            "params": {
                                "slope": {
                                    "type": "constant",
                                    "value": 0.1
                                },
                                "offset": {
                                    "type": "constant",
                                    "value": -273.15
                                },
                                "outputMeasurement": {
                                    "type": "continuous",
                                    "measurement": "temperature",
                                    "unit": "K/10"
                                },
                                "scalingMode": "mulSlopeAddOffset"
                            },
                            "sources": {
                                "raster": {
                                    "type": "RasterTypeConversion",
                                    "params": {
                                        "outputDataType": "F32"
                                    },
                                    "sources": {
                                        "raster": {
                                            "type": "GdalSource",
                                            "params": {
                                                "data": "mean_daily_minimum_2m_air_temperature"
                                            }
                                        }
                                    }
                                }
                            }
                        }, 
                        {
                            "type": "RasterScaling",
                            "params": {
                                "slope": {
                                    "type": "constant",
                                    "value": 0.1
                                },
                                "offset": {
                                    "type": "constant",
                                    "value": -273.15
                                },
                                "outputMeasurement": {
                                    "type": "continuous",
                                    "measurement": "temperature",
                                    "unit": "K/10"
                                },
                                "scalingMode": "mulSlopeAddOffset"
                            },
                            "sources": {
                                "raster": {
                                    "type": "RasterTypeConversion",
                                    "params": {
                                        "outputDataType": "F32"
                                    },
                                    "sources": {
                                        "raster": {
                                            "type": "GdalSource",
                                            "params": {
                                                "data": "mean_daily_maximum_2m_air_temperature"
                                            }
                                        }
                                    }
                                }
                            }
                        } 
                    ]
                } ##########################################################################
            } 
       }
    }
})
    
workflow_aeshna_affinis_join_plot_min

8426078a-2940-5a76-8f16-afda4ed45b80

The .plot_chart method can be used to get the plot, which can then be plotted using the altair package:

#Request the plot from Geo Engine
plot_aeshna_affinis_min = workflow_aeshna_affinis_join_plot_min.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_aeshna_affinis_min.spec)

The same can be done for the maximum temperature:

workflow_aeshna_affinis_join_plot_max = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "Histogram",
       "params": {
          "attributeName": "Max_Temperature",
           "bounds": "data",
           "buckets": {
               "type": "number",
               "value": 20
           }
       },
        "sources": {
            "source": { #Aeshna affinis Join #############################################
                "type": "RasterVectorJoin",
                "params": {
                    "names": {
                        "type": "names",
                        "values": ["Min_Temperature", "Max_Temperature"]
                    },
                    "temporalAggregation": "none",
                    "featureAggregation": "mean",
                },
                "sources": {
                    "vector": { 
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Aeshna affinis`",
                        }
                    }, 
                    "rasters": [{
                            "type": "RasterScaling",
                            "params": {
                                "slope": {
                                    "type": "constant",
                                    "value": 0.1
                                },
                                "offset": {
                                    "type": "constant",
                                    "value": -273.15
                                },
                                "outputMeasurement": {
                                    "type": "continuous",
                                    "measurement": "temperature",
                                    "unit": "K/10"
                                },
                                "scalingMode": "mulSlopeAddOffset"
                            },
                            "sources": {
                                "raster": {
                                    "type": "RasterTypeConversion",
                                    "params": {
                                        "outputDataType": "F32"
                                    },
                                    "sources": {
                                        "raster": {
                                            "type": "GdalSource",
                                            "params": {
                                                "data": "mean_daily_minimum_2m_air_temperature"
                                            }
                                        }
                                    }
                                }
                            }
                        }, 
                        {
                            "type": "RasterScaling",
                            "params": {
                                "slope": {
                                    "type": "constant",
                                    "value": 0.1
                                },
                                "offset": {
                                    "type": "constant",
                                    "value": -273.15
                                },
                                "outputMeasurement": {
                                    "type": "continuous",
                                    "measurement": "temperature",
                                    "unit": "K/10"
                                },
                                "scalingMode": "mulSlopeAddOffset"
                            },
                            "sources": {
                                "raster": {
                                    "type": "RasterTypeConversion",
                                    "params": {
                                        "outputDataType": "F32"
                                    },
                                    "sources": {
                                        "raster": {
                                            "type": "GdalSource",
                                            "params": {
                                                "data": "mean_daily_maximum_2m_air_temperature"
                                            }
                                        }
                                    }
                                }
                            }
                        } 
                    ]
                } ##########################################################################
            } 
       }
    }
})

#Request the plot from Geo Engine
plot_aeshna_affinis_max = workflow_aeshna_affinis_join_plot_max.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_aeshna_affinis_max.spec)

As you can see, VAT offers a lot of functionality, which will be deepened and extended in the following examples.

Further experiments

In this chapter, some other useful links between Geo Engine and Python are shown.

#Overlay plot with context
import geopandas as gpd
import matplotlib.pyplot as plt

#Request the data from Geo Engine into a xarray dataarray
data_min = workflow_t_min.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(-15.1189, 29.6655, 92.9116, 65.3164),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(1.0, 1.0),
        srs="EPSG:4326"
    )
)

#Request the data from Geo Engine into a xarray dataarray
data_max = workflow_t_max.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(-15.1189, 29.6655, 92.9116, 65.3164),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(1.0, 1.0),
        srs="EPSG:4326"
    )
)


#Plot the data
fig, ax = plt.subplots(1, 2, figsize=(20,10))

data_min.plot(ax=ax[0], vmin=-30, vmax=20)
data_aeshna_affinis.plot(ax=ax[0], color='red', markersize=3)

data_max.plot(ax=ax[1], vmin=-30, vmax=20)
data_aeshna_affinis.plot(ax=ax[1], color='red', markersize=3)

plt.show()

/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa

Canis lupus meets Felis silvestris

This workflow uses the VAT to compare the occurrence of Canis lupus and Felis silvestris as a function of land use classification from the Ökosystematlas.

The purpose of this notebook is also to demonstrate the capabilities of Geo Engine. Therefore some useful techniques will be shown:

Using the GBIF data catalogue
Point in polygon selection
Raster vector join of occurrence data with land use classification
Plotting a class histogram
Nested workflows

When building your own nested workflow, it is recommended to build it in several steps as shown in this notebook.

Documentation about the operators and how to use them in Python can be found here: https://docs.geoengine.io/operators/intro.html

Preparation

#Import packages
import geoengine as ge
import geoengine_openapi_client
from datetime import datetime
from geoengine.types import RasterBandDescriptor
import altair as alt

alt.renderers.enable('default')

RendererRegistry.enable('default')

#Initialize Geo Engine in VAT
ge.initialize("https://vat.gfbio.org/api")

#Get the GBIF DataProvider id (Useful for translating the DataProvider name to its id)
root_collection = ge.layer_collection()
gbif_prov_id = ''
for elem in root_collection.items:
    if elem.name == 'GBIF':
        gbif_prov_id = str(elem.provider_id)
        
gbif_prov_id

'1c01dbb9-e3ab-f9a2-06f5-228ba4b6bf7a'

Load boundaries of Germany for later GBIF occurrence extraction (optional)

This chapter is not required and only shows that country borders are available.

#Create workflow to request German border
workflow_germany = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": "germany",
        }
    }
})

workflow_germany

2429a993-385f-546f-b4f7-97b3ba4a5adb

#Set time
start_time = datetime.strptime(
    '2000-04-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
end_time = datetime.strptime(
    '2030-04-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_germany.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()

<Axes: >

Load Ökosystematlas for later raster vector join with occurrence data (optional)

This chapter is not needed and only shows that raster data is also available.

#Create a workflow to request the oekosystematlas raster data
workflow_oekosystematlas = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "GdalSource",
        "params": {
            "data": "oekosystematlas"
        }
    }
})

workflow_oekosystematlas

8a859eeb-0778-5190-a9d1-b1f787e4176d

#Request the data from Geo Engine into a xarray dataarray
data = workflow_oekosystematlas.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmax=75)

/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa





<matplotlib.collections.QuadMesh at 0x7f67d1c4ada0>

Processing Canis lupus

None of the following steps are theoretically necessary, as the entire workflow will be projected in the nested request in the end. However, the steps are intended to show the capabilities of Geo Engine.

Load Canis lupus (Optional)

#Create workflow to request Canis lupus incidents
workflow_canis_lupus = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": f"_:{gbif_prov_id}:`species/Canis lupus`",
        }
    }
})

workflow_canis_lupus.get_result_descriptor()

Data type:         MultiPoint
Spatial Reference: EPSG:4326
Columns:
  gbifid:
    Column Type: int
    Measurement: unitless
  scientificname:
    Column Type: text
    Measurement: unitless
  basisofrecord:
    Column Type: text
    Measurement: unitless

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_canis_lupus.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()

<Axes: >

Point in Polygon Canis lupus

#Create workflow to request Canis lupus incidents filtered by German border
workflow_canis_lupus_cut = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "PointInPolygonFilter",
        "params": {},
        "sources": {
            "points": { #Canis lupus ###############################
                "type": "OgrSource",
                "params": {
                    "data": f"_:{gbif_prov_id}:`species/Canis lupus`",
                    "attributeProjection": []
                } 
            }, #####################################################
            "polygons": { #Germany #################################
                "type": "OgrSource",
                "params": {
                    "data": "germany"
                }
            } ######################################################
        } 
    }
})

workflow_canis_lupus_cut

f30ac841-81b0-5301-bac6-840dd914c1ba

#Request the data from Geo Engine into a geopandas dataframe
data_canis_lupus = workflow_canis_lupus_cut.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data_canis_lupus.plot()

<Axes: >

Nested Point in Polygon and Raster Vector Join Canis lupus

#Create a workflow to request Canis lupus occurrences filtered by the German border and linked to the Ökosystematlas data.
workflow_canis_lupus_cut_join = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
            "names": {
                "type": "names",
                "values": ["Ökosystematlas"]
            },  
            "temporalAggregation": "none",
            "featureAggregation": "mean",
        },
        "sources": {
            "vector": { #Canis lupus cut ######################################
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Canis lupus`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            }, ##############################################################
            "rasters": [{ #Ökosystematlas ###################################
                "type": "GdalSource",
                "params": {
                    "data": "oekosystematlas"
                }
            }] ##############################################################
        },
    }
})

workflow_canis_lupus_cut_join

2c8ebbbc-b848-58e6-8f5c-f51976db3c8f

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_canis_lupus_cut_join.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    ),
    resolve_classifications=True
)

#Show the geopandas dataframe
data

	geometry	basisofrecord	gbifid	scientificname	Ökosystematlas	start	end
0	POINT (9.49776 52.08503)	HUMAN_OBSERVATION	3447336010	Canis familiaris Linnaeus, 1758	Laubwälder	2022-01-06 00:00:00+00:00	2022-01-06 00:00:00+00:00
1	POINT (8.63148 50.01629)	HUMAN_OBSERVATION	1579887520	Canis familiaris Linnaeus, 1758	Verkehrsinfrastruktur	2017-03-11 00:00:00+00:00	2017-03-11 00:00:00+00:00
2	POINT (9.55500 48.97333)	HUMAN_OBSERVATION	1579896270	Canis familiaris Linnaeus, 1758	Mischwälder	2017-01-01 00:00:00+00:00	2017-01-01 00:00:00+00:00
3	POINT (6.14376 50.81583)	HUMAN_OBSERVATION	1883797122	Canis familiaris Linnaeus, 1758	Grünland	2018-05-14 00:00:00+00:00	2018-05-14 00:00:00+00:00
4	POINT (10.29174 48.88160)	HUMAN_OBSERVATION	1891284730	Canis familiaris Linnaeus, 1758	Laubwälder	2018-08-16 00:00:00+00:00	2018-08-16 00:00:00+00:00
...	...	...	...	...	...	...	...
1336	POINT (14.90000 51.35000)	HUMAN_OBSERVATION	3725545490	Canis lupus Linnaeus, 1758	Nadelwälder	2019-01-13 00:00:00+00:00	2019-01-13 00:00:00+00:00
1337	POINT (12.42115 51.19143)	HUMAN_OBSERVATION	3712440633	Canis lupus Linnaeus, 1758	Siedlungsfläche mit niedriger Baudichte	2022-03-05 17:27:07+00:00	2022-03-05 17:27:07+00:00
1338	POINT (14.20000 51.45000)	HUMAN_OBSERVATION	2837851869	Canis lupus Linnaeus, 1758	Siedlungsfläche mit niedriger Baudichte	2019-04-26 00:00:00+00:00	2019-04-26 00:00:00+00:00
1339	POINT (14.85000 51.35000)	HUMAN_OBSERVATION	2836478160	Canis lupus Linnaeus, 1758	Ackerland	2019-01-13 00:00:00+00:00	2019-01-13 00:00:00+00:00
1340	POINT (6.51747 49.46328)	HUMAN_OBSERVATION	2511463696	Canis lupus Linnaeus, 1758	Laubwälder	2014-01-01 00:00:00+00:00	2014-01-01 00:00:00+00:00

1341 rows × 7 columns

Nested Full Workflow Canis lupus

#Create a workflow to plot Canis lupus occurrences filtered by the German border and merged with Ökosystematlas data as a class histogram.
workflow_canis_lupus_full = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "ClassHistogram",
       "params": {
          "columnName": "Ökosystematlas"
       },
        "sources": {
            "source": { #Canis lupus cut join #####################################
                "type": "RasterVectorJoin",
                "params": {
                        "names": {
                            "type": "names",
                            "values": ["Ökosystematlas"]
                        }, 
                        "temporalAggregation": "none",
                        "featureAggregation": "mean",
                },
                "sources": {
                    "vector": {
                        "type": "PointInPolygonFilter",
                        "params": {},
                        "sources": {
                            "points": {
                                "type": "OgrSource",
                                "params": {
                                    "data": f"_:{gbif_prov_id}:`species/Canis lupus`",
                                    "attributeProjection": []
                                }
                            },
                            "polygons": {
                                "type": "OgrSource",
                                "params": {
                                    "data": "germany"
                                }
                            }
                        }
                    },
                    "rasters": [{
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        }
                    }]
                }
            } ######################################################################
       }
    }
})
    
workflow_canis_lupus_full

b182c10b-59ce-5d5b-946f-fccc3ae04c88

#Request the plot from Geo Engine
plot_canis_lupus = workflow_canis_lupus_full.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_canis_lupus.spec)

Processing Felis silvestris

Load Felis silvestris (Optional)

#Create workflow to request Felis silvestris occurrences
workflow_felis_silvestris = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": f"_:{gbif_prov_id}:`species/Felis silvestris`",
        }
    }
})

workflow_felis_silvestris

f8d5abd5-7d5f-567e-97a2-7830052d6cbf

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_felis_silvestris.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()

<Axes: >

Point in Polygon Felis silvestris

#Create workflow to request Felis silvestris occurrences filtered by German border
workflow_felis_silvestris_cut = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "PointInPolygonFilter",
        "params": {},
        "sources": {
            "points": { #Felis silvestris ################################
                "type": "OgrSource",
                "params": {
                    "data": f"_:{gbif_prov_id}:`species/Felis silvestris`",
                    "attributeProjection": []
                }
            }, ###########################################################
            "polygons": { #Germany #######################################
                "type": "OgrSource",
                "params": {
                    "data": "germany"
                }
            } ############################################################
        } 
    }
})

workflow_felis_silvestris_cut

518c27b3-0ce7-56ac-b826-5a72be463a73

#Request the data from Geo Engine into a geopandas dataframe
data_felis_silvestris = workflow_felis_silvestris_cut.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data_felis_silvestris.plot()

<Axes: >

Nested Point in Polygon and Raster Vector Join Felis silvestris

#Create a workflow to request Felis silvestris occurrences filtered by the German border and linked to the Ökosystematlas data.
workflow_felis_silvestris_cut_join = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
                "names": {
                    "type": "names",
                    "values": ["Ökosystematlas"]
                }, 
                "temporalAggregation": "none",
                "featureAggregation": "mean",
        },
        "sources": {
            "vector": { #Felis silvestris cut #####################################
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Felis silvestris`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            }, ###################################################################
            "rasters": [{ #Ökosystematlas ########################################
                "type": "GdalSource",
                "params": {
                    "data": "oekosystematlas"
                }
            }] ###################################################################
        },
    }
})

workflow_felis_silvestris_cut_join

355b4e59-65cc-5cfe-a0b4-636f4d41beab

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_felis_silvestris_cut_join.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    ),
    resolve_classifications=True
)

#Show the geopandas dataframe
data

	geometry	basisofrecord	gbifid	scientificname	Ökosystematlas	start	end
0	POINT (8.08720 50.78140)	MATERIAL_SAMPLE	3774757042	Felis silvestris Schreber, 1777	Laubwälder	2015-09-13 00:00:00+00:00	2015-09-13 00:00:00+00:00
1	POINT (6.74050 50.43160)	PRESERVED_SPECIMEN	3774755207	Felis silvestris Schreber, 1777	Grünland	2017-10-11 00:00:00+00:00	2017-10-11 00:00:00+00:00
2	POINT (6.36984 50.50914)	HUMAN_OBSERVATION	1828993691	Felis silvestris Schreber, 1777	Natürliche und extensiv genutzte Grünflächen	2018-02-24 00:00:00+00:00	2018-02-24 00:00:00+00:00
3	POINT (6.92310 50.62580)	PRESERVED_SPECIMEN	3774754593	Felis silvestris Schreber, 1777	Ackerland	2017-11-08 00:00:00+00:00	2017-11-08 00:00:00+00:00
4	POINT (6.87770 50.42950)	PRESERVED_SPECIMEN	3774753913	Felis silvestris Schreber, 1777	Nadelwälder	2003-10-14 00:00:00+00:00	2003-10-14 00:00:00+00:00
...	...	...	...	...	...	...	...
1116	POINT (6.13130 50.10320)	HUMAN_OBSERVATION	3695923471	Felis silvestris Schreber, 1777	Laubwälder	2016-07-19 00:00:00+00:00	2016-07-19 00:00:00+00:00
1117	POINT (6.13130 50.10320)	HUMAN_OBSERVATION	3695924066	Felis silvestris Schreber, 1777	Laubwälder	2019-01-09 00:00:00+00:00	2019-01-09 00:00:00+00:00
1118	POINT (6.13130 50.10320)	HUMAN_OBSERVATION	3695924069	Felis silvestris Schreber, 1777	Laubwälder	2019-01-04 00:00:00+00:00	2019-01-04 00:00:00+00:00
1119	POINT (8.29065 50.12195)	HUMAN_OBSERVATION	841588052	Felis silvestris Schreber, 1777	Nadelwälder	2013-08-06 17:51:20+00:00	2013-08-06 17:51:20+00:00
1120	POINT (6.13130 50.10320)	HUMAN_OBSERVATION	3695923382	Felis silvestris Schreber, 1777	Laubwälder	2019-11-01 00:00:00+00:00	2019-11-01 00:00:00+00:00

1121 rows × 7 columns

Nested Full Workflow Felis silvestris

#Create a workflow to plot Felis silvestris occurrences filtered by the German border and merged with the Ökosystematlas data as a class histogram.
workflow_felis_silvestris_full = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "ClassHistogram",
       "params": {
          "columnName": "Ökosystematlas"
       },
        "sources": {
            "source": {
                "type": "RasterVectorJoin",
                "params": {
                        "names": {
                            "type": "names",
                            "values": ["Ökosystematlas"]
                        }, 
                        "temporalAggregation": "none",
                        "featureAggregation": "mean",
                },
                "sources": {
                    "vector": {
                        "type": "PointInPolygonFilter",
                        "params": {},
                        "sources": {
                            "points": {
                                "type": "OgrSource",
                                "params": {
                                    "data": f"_:{gbif_prov_id}:`species/Felis silvestris`",
                                    "attributeProjection": []
                                }
                            },
                            "polygons": {
                                "type": "OgrSource",
                                "params": {
                                    "data": "germany"
                                }
                            }
                        }
                    },
                    "rasters": [{
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        }
                    }]
                }
            }
       }
    }
})
    
workflow_felis_silvestris_full

db03640c-cf0e-5fe0-978c-f45a55eb5da3

#Request the plot from Geo Engine
plot_felis_silvestris = workflow_felis_silvestris_full.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_felis_silvestris.spec)

Comparison Canis lupus and Felis silvestris

#Show the plot from Canis lupus
alt.Chart.from_dict(plot_canis_lupus.spec)

#Show the plot from Felis silvestris
alt.Chart.from_dict(plot_felis_silvestris.spec)

Further experiments

In this chapter, some other useful links between Geo Engine and Python are shown.

#Comparison plots
import pandas as pd

# Convert the JSON data to pandas DataFrames
df1 = pd.DataFrame(plot_canis_lupus.spec['data']['values'])
df2 = pd.DataFrame(plot_felis_silvestris.spec['data']['values'])

df1['dataset'] = 'Canis lupus'
df2['dataset'] = 'Felis silvestris'

combined_df = pd.concat([df1, df2])

chart = alt.Chart(combined_df).mark_bar().encode(
    x=alt.X('Land Cover:N', title='Land Cover'),
    y=alt.Y('Frequency:Q', title='Frequency'),
    color=alt.Color('dataset:N', title='Dataset'),
    xOffset=alt.Color('dataset:N', title='Dataset')
).properties(width=600)

# Display the grouped barplot
chart

#Plotting of multiple species
import geopandas as gpd

gdf1 = data_canis_lupus
gdf2 = data_felis_silvestris

gdf1['dataset'] = 'Canis lupus'
gdf2['dataset'] = 'Felis silvestris'

combined_gdf = pd.concat([gdf1, gdf2])

combined_gdf.plot(column='dataset', cmap='rainbow', markersize=5, legend=True)

<Axes: >

On dry land

This workflow uses the VAT to evaluate the distribution of Calopteryx splendens in dependence of the land use classification from the Ökosystematlas and a temporal aggregation of the average air temperature.

The purpose of this notebook is to demonstrate the capabilities of Geo Engine. Therefore, some useful techniques will be shown:

Using the GBIF data catalogue
Point in polygon selection
Raster vector join of occurrence data with land use classification
Plotting a class histogram
Nested workflows

When building your own nested workflow, it is recommended to build it in several steps as shown in this notebook.

Documentation about the operators and how to use them in Python can be found here: https://docs.geoengine.io/operators/intro.html

Preparation

#Import packages
import geoengine as ge
import geoengine_openapi_client
from datetime import datetime
from geoengine.types import RasterBandDescriptor
import altair as alt
import asyncio
import nest_asyncio
import warnings

warnings.filterwarnings('ignore')
alt.renderers.enable('default')

RendererRegistry.enable('default')

#Initialize Geo Engine in VAT
ge.initialize("https://vat.gfbio.org/api")

#Get the GBIF DataProvider id (Useful for translating the DataProvider name to its id)
root_collection = ge.layer_collection()
gbif_prov_id = ''
for elem in root_collection.items:
    if elem.name == 'GBIF':
        gbif_prov_id = str(elem.provider_id)
        
gbif_prov_id

'1c01dbb9-e3ab-f9a2-06f5-228ba4b6bf7a'

Load boundaries of Germany for later GBIF occurrence extraction (optional)

This chapter is not needed and only shows that country boundaries are available

#Create workflow to request germany boundary
workflow_germany = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": "germany",
        }
    }
})

workflow_germany

2429a993-385f-546f-b4f7-97b3ba4a5adb

#Set time
start_time = datetime.strptime(
    '2010-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
end_time = datetime.strptime(
    '2011-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_germany.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()

<Axes: >

Load Ökosystematlas (detailed) for later raster vector join with occurrence data (optional)

This chapter is not needed and only shows that raster data is also available.

#Create workflow to request the oekosystematlas raster data
workflow_oekosystematlas = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "GdalSource",
        "params": {
            "data": "oekosystematlas_detail"
        }
    }
})

workflow_oekosystematlas

f447601c-0ba1-57c3-9127-b0622f982231

#Request the data from Geo Engine into a xarray dataarray
data = workflow_oekosystematlas.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmax=75)

<matplotlib.collections.QuadMesh at 0x714e5ee6fc70>

Load Average temperature for later raster vector join with event data (optional)

This chapter is not needed and only shows that raster data is also available.

#Create workflow to request the average temperature raster data
workflow_t_avg = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "GdalSource",
        "params": {
            "data": "daily_mean_air_temperature"
        }
    }
})

workflow_t_avg

b29f3579-2e15-5280-96b6-c1aef29517a6

#Preparing of the boundaries for the workflow raster stream
bbox = ge.QueryRectangle(
    ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
    ge.TimeInterval(start_time, end_time),
    resolution=ge.SpatialResolution(0.1, 0.1),
    srs="EPSG:4326"
)

#Request the data from Geo Engine into a xarray dataarray
data = workflow_t_avg.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmin=-3, vmax=3)

<matplotlib.collections.QuadMesh at 0x714e5ed69ea0>

Processing Calopteryx splendens

None of the following steps are necessary in theory, as the entire workflow will be projected in the nested request in the end. However, the steps are intended to show the capabilities of Geo Engine and how to logically build nested workflows.

Load Calopteryx splendens (Optional)

#Create workflow to request Calopteryx splendens occurences
workflow_calopteryx_splendens = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
        }
    }
})

workflow_calopteryx_splendens.get_result_descriptor()

Data type:         MultiPoint
Spatial Reference: EPSG:4326
Columns:
  gbifid:
    Column Type: int
    Measurement: unitless
  scientificname:
    Column Type: text
    Measurement: unitless
  basisofrecord:
    Column Type: text
    Measurement: unitless

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_calopteryx_splendens.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()

<Axes: >

Point in Polygon Calopteryx splendens

#Create workflow to request Calopteryx splendens occurrences filtered by German border
workflow_calopteryx_splendens_cut = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "PointInPolygonFilter",
        "params": {},
        "sources": {
            "points": { #Calopteryx splendens ###############################
                "type": "OgrSource",
                "params": {
                    "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
                    "attributeProjection": []
                } 
            }, #####################################################
            "polygons": { #Germany #################################
                "type": "OgrSource",
                "params": {
                    "data": "germany"
                }
            } ######################################################
        } 
    }
})

workflow_calopteryx_splendens_cut

6cf9ef88-8bd3-5904-bc74-f866165b18c3

#Request the data from Geo Engine into a geopandas dataframe
data_calopteryx_splendens = workflow_calopteryx_splendens_cut.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data_calopteryx_splendens.plot()

<Axes: >

Nested Point in Polygon and Raster Vector Join Calopteryx splendens

#Create a workflow to request Calopteryx splendens occurrences filtered by the German border and linked to the Ökosystematlas data.
workflow_calopteryx_splendens_cut_join = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
            "names": {
                "type": "names",
                "values": ["Ökosystematlas", "Avg_Temperature"]
            }, 
            "temporalAggregation": "none",
            "featureAggregation": "first",
        },
        "sources": {
            "vector": { #Calopteryx splendens cut ######################################
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            }, ##############################################################
            "rasters": [{ #Ökosystematlas ###################################
                "type": "GdalSource",
                "params": {
                    "data": "oekosystematlas"
                }
            }, ##############################################################
            { #Average temperature
                "type": "GdalSource",
                "params": {
                    "data": "daily_mean_air_temperature"
                }
            }] ##############################################################
        },
    }
})

workflow_calopteryx_splendens_cut_join

bab94dec-6025-56f6-bef8-39c14937939b

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_calopteryx_splendens_cut_join.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    ),
    resolve_classifications=True
)

#Show the geopandas dataframe
data

	geometry	Avg_Temperature	basisofrecord	gbifid	scientificname	Ökosystematlas	start	end
0	POINT (6.79395 51.93967)	13.400000	HUMAN_OBSERVATION	700582646	Calopteryx splendens Harris, 1780	No data	2010-04-07 00:00:00+00:00	2010-04-07 00:00:00+00:00
1	POINT (6.79395 51.93967)	13.400000	HUMAN_OBSERVATION	700582645	Calopteryx splendens Harris, 1780	No data	2010-04-07 00:00:00+00:00	2010-04-07 00:00:00+00:00
2	POINT (6.75459 52.09421)	8.500000	HUMAN_OBSERVATION	700578315	Calopteryx splendens Harris, 1780	Ackerland	2010-04-12 00:00:00+00:00	2010-04-12 00:00:00+00:00
3	POINT (6.75459 52.09421)	8.500000	HUMAN_OBSERVATION	700578316	Calopteryx splendens Harris, 1780	Ackerland	2010-04-12 00:00:00+00:00	2010-04-12 00:00:00+00:00
4	POINT (7.57250 51.63501)	19.700001	HUMAN_OBSERVATION	920849630	Calopteryx splendens Harris, 1780	Grünland	2010-04-29 00:00:00+00:00	2010-04-29 00:00:00+00:00
...	...	...	...	...	...	...	...	...
535	POINT (7.62722 47.98439)	0.000000	HUMAN_OBSERVATION	3845932111	Calopteryx splendens Harris, 1780	Ackerland	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00
536	POINT (7.79354 48.33831)	NaN	HUMAN_OBSERVATION	3844974542	Calopteryx splendens Harris, 1780	Ackerland	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00
537	POINT (7.80175 48.42811)	0.200000	HUMAN_OBSERVATION	3845548749	Calopteryx splendens Harris, 1780	Laubwälder	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00
538	POINT (8.10648 48.77593)	-0.400000	HUMAN_OBSERVATION	3845803099	Calopteryx splendens Harris, 1780	Grünland	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00
539	POINT (7.81823 48.60769)	NaN	HUMAN_OBSERVATION	3845383562	Calopteryx splendens Harris, 1780	No data	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00

540 rows × 8 columns

Nested Full Workflow Calopteryx splendens Plot Ökosystematlas

#Create a workflow to plot Calopteryx splendens occurrences filtered by the German border and merged with the ecosystematlas data as a class histogram.
workflow_calopteryx_splendens_full_öko = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "ClassHistogram",
       "params": {
          "columnName": "Ökosystematlas"
       },
        "sources": {
            "source": { #Calopteryx splendens cut join #####################################
                "type": "RasterVectorJoin",
                "params": {
                        "names": {
                            "type": "names",
                            "values": ["Ökosystematlas", "Avg_Temperature"]
                        }, 
                        "temporalAggregation": "none",
                        "featureAggregation": "mean",
                },
                "sources": {
                    "vector": {
                        "type": "PointInPolygonFilter",
                        "params": {},
                        "sources": {
                            "points": {
                                "type": "OgrSource",
                                "params": {
                                    "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
                                    "attributeProjection": []
                                }
                            },
                            "polygons": {
                                "type": "OgrSource",
                                "params": {
                                    "data": "germany"
                                }
                            }
                        }
                    },
                    "rasters": [{
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        }
                    },
                    {
                        "type": "GdalSource",
                        "params": {
                            "data": "daily_mean_air_temperature"
                        }
                    }]
                }
            } ######################################################################
       }
    }
})
    
workflow_calopteryx_splendens_full_öko

7eec35db-1e35-5569-8354-32c7b31fddb4

#Request the plot from Geo Engine
plot_calopteryx_splendens = workflow_calopteryx_splendens_full_öko.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_calopteryx_splendens.spec)

Nested Full Workflow Calopteryx splendens Plot Average Temperature

#Create a workflow to request Calopteryx splendens occurrences filtered by the German border and linked to the Ökosystematlas data.
workflow_calopteryx_splendens_full_avg_temp = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
                "names": {
                    "type": "names",
                    "values": ["Ökosystematlas", "Avg_Temperature"]
                }, 
                "temporalAggregation": "none",
                "featureAggregation": "mean",
        },
        "sources": {
            "vector": {
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            },
            "rasters": [{
                "type": "GdalSource",
                "params": {
                    "data": "oekosystematlas"
                }
            },
            {
                "type": "GdalSource",
                "params": {
                    "data": "daily_mean_air_temperature"
                }
            }]
        },
    }
})

workflow_calopteryx_splendens_full_avg_temp

6ab4ea15-948a-5105-87cd-d9b89548f9bd

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_calopteryx_splendens_full_avg_temp.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the geopandas dataframe
data.plot(column='Avg_Temperature', legend=True, legend_kwds={'label': 'Average Temperature'})

<Axes: >

Further experiments

In this chapter, some other useful links between Geo Engine and Python are shown.

#Overlay plot with context
import geopandas as gpd
import matplotlib.pyplot as plt

#Request the data from Geo Engine into a xarray dataarray
data = workflow_t_avg.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmin=-3, vmax=3)
data_calopteryx_splendens.plot(ax=plt.gca(), color='red', markersize=3)
plt.show()

VAT 4 Machine Learning - Creating Training data for a species distribution model

This workflow is a contribution to the NFDI4Earth conference.

This workflow is a contribution to the NFDI4Earth conference. This workflow therefore uses the frequency of Arnica montana occurrences from GBIF as a target variable together with weather data from CHELSA, land use classification from the Ökosystematlas and topographic information as predictor variables to create a species distribution model for Arnica montana across Germany.

Import

#Import Packages
import geoengine as ge
from datetime import datetime 
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import xarray as xr
import numpy as np
import asyncio
import nest_asyncio

#Initialize Geo Engine in VAT
ge.initialize("https://vat.gfbio.org/api")

#Get the GBIF DataProvider id (useful for translating the DataProvider name to its id)
root_collection = ge.layer_collection()
gbif_prov_id = ''
for elem in root_collection.items:
    if elem.name == 'GBIF':
        gbif_prov_id = str(elem.provider_id)
        
gbif_prov_id

'1c01dbb9-e3ab-f9a2-06f5-228ba4b6bf7a'

Create Labelled Data

This chapter shows how to register the workflow for generating training data and how to manipulate this data to generate training data.

#Tuning parameters
start_time = datetime.strptime('2001-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
end_time = datetime.strptime('2011-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
resolution = ge.SpatialResolution(0.01, 0.01)
extent = ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334)

#Species selection
species = "species/Arnica montana" #Arnica

#Create a workflow to retrieve Arnica montana occurrences filtered by the German border and linked to weather, land use and topographic data.
workflow = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
            "names": {
                "type": "names",
                "values": ["Ökosystematlas", "SRTM", "Mean Air Temperature", "Mean Climate Moisture Index", "Precipitation"]
            },  
            "temporalAggregation": "none",
            "featureAggregation": "first",
        },
        "sources": {
            "vector": { #Arnica montana #########################################
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`{species}`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            }, 
            "rasters": [{ #Ökosystematlas ########################################
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        },
                    }
                }
            },
            { #SRTM #########################################################
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "srtm"
                        },
                    }
                }
                
            },
            { #Mean Annual Air Temperature ##################################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "mean",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": -273.15
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "temperature",
                                "unit": "K/10"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "mean_daily_air_temperature"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            },
            { #Mean Annual Climate moisture indices #########################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "mean",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": 0
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "climate moisture",
                                "unit": "kg m^-2 month^-1"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "monthly_climate_moisture_indicies"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            },
            { #Sum Annual Precipitation ####################################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "sum",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": 0
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "precipitation",
                                "unit": "kg m-2 month^-1"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "monthly_precipitation_amount"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }]
        },
    }
})
    
workflow

7582cfcb-3d36-5b86-bb72-e81cef584fae

#Request the data from Geo Engine into a geopandas dataframe
data = workflow.get_dataframe(
    ge.QueryRectangle(
        extent,
        ge.TimeInterval(start_time, end_time),
        resolution=resolution,
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()

<Axes: >

data

	geometry	Mean Air Temperature	Mean Climate Moisture Index	Precipitation	SRTM	basisofrecord	gbifid	scientificname	Ökosystematlas	start	end
0	POINT (11.29000 50.47000)	6.600011	42.033333	1186.700073	684.0	HUMAN_OBSERVATION	1922039098	Arnica montana L.	12.0	2001-09-24 00:00:00+00:00	2001-09-24 00:00:00+00:00
1	POINT (10.04000 47.52000)	6.658340	108.008339	2084.500244	845.0	HUMAN_OBSERVATION	1922860404	Arnica montana L.	12.0	2001-08-21 00:00:00+00:00	2001-08-21 00:00:00+00:00
2	POINT (11.29000 50.42000)	7.016680	41.541668	1193.100098	638.0	HUMAN_OBSERVATION	1922902358	Arnica montana L.	6.0	2001-10-01 00:00:00+00:00	2001-10-01 00:00:00+00:00
3	POINT (10.04000 47.52000)	6.658340	108.008339	2084.500244	845.0	HUMAN_OBSERVATION	1922858802	Arnica montana L.	12.0	2001-07-11 00:00:00+00:00	2001-07-11 00:00:00+00:00
4	POINT (10.21000 47.37000)	3.758347	102.083336	1912.100098	1649.0	HUMAN_OBSERVATION	1926238160	Arnica montana L.	255.0	2001-07-04 00:00:00+00:00	2001-07-04 00:00:00+00:00
...	...	...	...	...	...	...	...	...	...	...	...
1551	POINT (11.18340 47.59551)	8.066678	41.883335	1305.400024	638.0	HUMAN_OBSERVATION	920659766	Arnica montana L.	2.0	2010-06-10 00:00:00+00:00	2010-06-10 00:00:00+00:00
1552	POINT (12.05000 50.11000)	7.183350	5.225000	784.700012	594.0	HUMAN_OBSERVATION	1806720955	Arnica montana L.	11.0	2010-06-22 00:00:00+00:00	2010-06-22 00:00:00+00:00
1553	POINT (13.04000 48.87000)	7.075012	46.866669	1302.099976	706.0	HUMAN_OBSERVATION	1927043392	Arnica montana L.	8.0	2010-06-16 00:00:00+00:00	2010-06-16 00:00:00+00:00
1554	POINT (12.04000 50.12000)	7.425008	0.308333	768.400024	557.0	HUMAN_OBSERVATION	1806720970	Arnica montana L.	6.0	2010-06-22 00:00:00+00:00	2010-06-22 00:00:00+00:00
1555	POINT (11.64000 50.09000)	NaN	NaN	NaN	536.0	HUMAN_OBSERVATION	1946786537	Arnica montana L.	6.0	2011-01-01 00:00:00+00:00	2011-01-01 00:00:00+00:00

1556 rows × 11 columns

#Rounding and grouping of occurrences to create frequency along with predictor variable combination
training_data = data.round(3)
training_data = training_data.groupby(['Mean Air Temperature', 'Mean Climate Moisture Index', 'Precipitation', 'SRTM', 'Ökosystematlas']).size().reset_index(name='counts')
training_data

	Mean Air Temperature	Mean Climate Moisture Index	Precipitation	SRTM	Ökosystematlas	counts
0	-0.842	126.342	2321.8	2036.0	14.0	13
1	-0.717	178.900	2899.2	1938.0	16.0	3
2	0.275	153.200	2687.8	1811.0	255.0	20
3	0.858	123.850	2270.8	1798.0	14.0	1
4	0.900	109.475	2042.6	1822.0	11.0	1
...	...	...	...	...	...	...
347	9.500	-14.567	631.3	216.0	6.0	1
348	9.692	14.392	971.4	292.0	12.0	1
349	9.692	15.342	943.5	327.0	10.0	1
350	10.317	6.358	775.4	120.0	10.0	1
351	10.667	0.325	756.8	99.0	2.0	1

352 rows × 6 columns

training_data.sort_values('counts', ascending=False)

	Mean Air Temperature	Mean Climate Moisture Index	Precipitation	SRTM	Ökosystematlas	counts
52	5.850	128.183	2370.3	1072.0	8.0	54
24	3.858	121.725	2325.1	1071.0	14.0	43
147	7.042	33.925	1123.1	565.0	13.0	43
28	4.258	102.975	2073.7	1435.0	11.0	38
25	3.875	112.683	2174.2	1414.0	12.0	36
...	...	...	...	...	...	...
116	6.808	72.058	1677.0	875.0	8.0	1
114	6.808	18.958	948.8	668.0	8.0	1
113	6.800	68.783	1684.7	1018.0	8.0	1
232	7.617	39.742	1317.6	729.0	15.0	1
351	10.667	0.325	756.8	99.0	2.0	1

352 rows × 6 columns

Create Prediction Data

This chapter shows how to register the workflow to create prediction data.

#Create a workflow to request weather, land use and topographic data as a raster stack.
prediction_workflow = ge.register_workflow({
    "type": "Raster",
    "operator": {
          "type": "RasterStacker",
          "params": {
            "renameBands": {
              "type": "rename",
              "values": ["Ökosystematlas", "SRTM", "Mean Air Temperature", "Mean Climate Moisture Index", "Precipitation"]
            }
          },
          "sources": {
            "rasters": [{ #Ökosystematlas ########################################
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        },
                    }
                }
            },
            { #SRTM #########################################################
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "srtm"
                        },
                    }
                }
                
            },
            { #Mean Annual Air Temperature ##################################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "mean",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": -273.15
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "temperature",
                                "unit": "K/10"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "mean_daily_air_temperature"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            },
            { #Mean Annual Climate moisture indices #########################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "mean",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": 0
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "climate moisture",
                                "unit": "kg m^-2 month^-1"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "monthly_climate_moisture_indicies"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            },
            { #Sum Annual Precipitation ####################################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "sum",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": 0
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "precipitation",
                                "unit": "kg m-2 month^-1"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "monthly_precipitation_amount"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }]
          }
        }
    
})

prediction_workflow

370296a3-db66-599b-8e55-2a4bf362a09a

#Preparing of the boundaries for the workflow raster stream
bbox = ge.QueryRectangle(
    extent,
    ge.TimeInterval(start_time, start_time),
    resolution=resolution,
    srs="EPSG:4326"
)

nest_asyncio.apply()

async def get_prediction_data(workflow, bbox, bands=[0,1,2,3,4], clip=True):
    data = await workflow.raster_stream_into_xarray(bbox, bands=bands, clip_to_query_rectangle=clip)
    data.to_dataset(name="prediction")
    return data

async def main(extent, time, resolution, workflow):
    bbox = ge.QueryRectangle(extent, ge.TimeInterval(time, time), resolution=resolution, srs="EPSG:4326")
    return await get_prediction_data(workflow, bbox)

try:
    loop = asyncio.get_event_loop()
except RuntimeError:
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)

prediction_data = loop.run_until_complete(main(extent, start_time, resolution, prediction_workflow))
prediction_data.to_dataset(name="prediction")

/home/duempelmann/geoengine_env/lib/python3.10/site-packages/rasterio/windows.py:314: RasterioDeprecationWarning: The height, width, and precision parameters are unused, deprecated, and will be removed in 2.0.0.
  warnings.warn(

<xarray.Dataset> Size: 14MB
Dimensions:      (x: 918, y: 780, time: 1, band: 5)
Coordinates:

x            (x) float64 7kB 5.855 5.865 5.875 5.885 ... 15.0 15.01 15.02
y            (y) float64 6kB 55.07 55.06 55.05 55.04 ... 47.3 47.29 47.28
time         (time) datetime64[ns] 8B 2001-01-01
band         (band) int64 40B 0 1 2 3 4
spatial_ref  int64 8B 0
Data variables:
prediction   (time, band, y, x) float32 14MB 21.0 21.0 ... 1.082e+03

#Plotting the Layers of the returned xarray dataarray
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(16, 8))

axes[0, 0].set_ylim(-0.01, 1.01)
axes[0, 1].set_ylim(-0.01, 1.01)
axes[0, 2].set_ylim(-0.01, 1.01)
axes[1, 0].set_ylim(-0.01, 1.01)
axes[1, 1].set_ylim(-0.01, 1.01)

# Add your plot data and other customizations to each subplot
prediction_data.isel(band=0).plot(ax=axes[0, 0], vmin=0, vmax=74)
prediction_data.isel(band=1).plot(ax=axes[0, 1], vmin=0, vmax=3000)
prediction_data.isel(band=2).plot(ax=axes[0, 2])
prediction_data.isel(band=3).plot(ax=axes[1, 0], vmin=-100, vmax=300)
prediction_data.isel(band=4).plot(ax=axes[1, 1])

axes[0, 0].set_title("Ökosystematlas")
axes[0, 1].set_title("SRTM")
axes[0, 2].set_title("Mean Annual Air Temperature")
axes[1, 0].set_title("Mean Annual Climate moisture indices")
axes[1, 1].set_title("Sum Annual Precipitation")

plt.subplots_adjust(wspace=0.2, hspace=0.4)

plt.show()

Machine Learning

In this chapter, the training data is used to create a simple RandomForestRegressor model, which is hyperparameterised using a GridSearchCV and the best model is selected for prediction later.

#Create training and test data
X = training_data[['Mean Air Temperature', 'Mean Climate Moisture Index', 'Precipitation', 'SRTM', 'Ökosystematlas']]
y = training_data['counts']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#Define the hyperparameter grid
param_grid = {
    'n_estimators': [200, 400, 600, 800, 1000],
    'max_depth': [5, 10, 15, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

#Create the random forest regressor model
rf = RandomForestRegressor()

#Perform grid search cross-validation
grid_search = GridSearchCV(rf, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=4, verbose=2)
grid_search.fit(X_train, y_train)

#Get the best hyperparameters and model
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

Fitting 5 folds for each of 180 candidates, totalling 900 fits
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.5s
[CV] END max_depth=5, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=600; total time=   0.4s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=10, min_samples_leaf=4, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=800; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=1, min_samples_split=10, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=200; total time=   0.2s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=5, n_estimators=1000; total time=   0.8s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=600; total time=   0.5s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=2, min_samples_split=10, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=200; total time=   0.1s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=600; total time=   0.4s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=2, n_estimators=1000; total time=   0.7s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=400; total time=   0.3s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=600; total time=   0.4s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=800; total time=   0.6s
[CV] END max_depth=15, min_samples_leaf=4, min_samples_split=5, n_estimators=1000; total time=   0.7s

best_params

{'max_depth': 5,
 'min_samples_leaf': 4,
 'min_samples_split': 10,
 'n_estimators': 200}

#Simple prediction on the training data
y_pred = best_model.predict(X_test)

#Model performance using r2
r2 = r2_score(y_test, y_pred)
print(f"R2 score: {r2:.2f}")

R2 score: 0.06

Prediction

In this chapter the best model is chosen and used to predict on the prediction data, for the whole of Germany.

# Flatten the xarray dataset to a 2D array
prediction_df = prediction_data.to_dataset(dim="band").to_dataframe().reset_index()
X_pred = prediction_df.loc[:, [0, 1, 2, 3, 4]]
X_pred.columns = ["Ökosystematlas", "SRTM", "Mean Air Temperature", "Mean Climate Moisture Index", "Precipitation"]
X_pred = X_pred[['Mean Air Temperature', 'Mean Climate Moisture Index', 'Precipitation', 'SRTM', 'Ökosystematlas']]

# Use the trained model to make predictions
y_pred = best_model.predict(X_pred)
y_pred_log = np.log(y_pred)
y_pred

array([9.35699139, 9.35699139, 9.35699139, ..., 6.37438362, 6.25392884,
       6.70247926])

#Extract coordinates for spatial alignment
prediction_ds = prediction_data.to_dataset(name='prediction_data')
x_coords = prediction_ds.coords['x'].values
y_coords = prediction_ds.coords['y'].values

#Reshape the model prediction for plotting
y_pred_reshaped = y_pred.reshape(prediction_data.time.size, 1, prediction_data.y.size, prediction_data.x.size)
da = xr.DataArray(y_pred_reshaped, dims=['time', 'band', 'y', 'x'])
da = da.assign_coords(x=x_coords, y=y_coords)
da.rio.write_crs('EPSG:4326', inplace=True)

#Reshape the model prediction for plotting
y_pred_reshaped_log = y_pred_log.reshape(prediction_data.time.size, 1, prediction_data.y.size, prediction_data.x.size)
da_log = xr.DataArray(y_pred_reshaped_log, dims=['time', 'band', 'y', 'x'])
da_log = da_log.assign_coords(x=x_coords, y=y_coords)
da_log.rio.write_crs('EPSG:4326', inplace=True)

<xarray.DataArray (time: 1, band: 1, y: 780, x: 918)> Size: 6MB
array([[[[2.23612381, 2.23612381, 2.23612381, ..., 0.80446736,
0.80446736, 0.80446736],
[2.23612381, 2.23612381, 2.23612381, ..., 0.80937892,
0.80937892, 0.80937892],
[2.23612381, 2.23612381, 2.23612381, ..., 0.81060927,
0.81060927, 0.81060927],
...,
[0.75689373, 0.75689373, 0.75689373, ..., 1.5268051 ,
1.40980293, 1.51508751],
[0.75689373, 0.75689373, 0.75689373, ..., 1.68137312,
1.86942503, 1.63518892],
[0.75689373, 0.75583891, 0.75511056, ..., 1.8522874 ,
1.83320988, 1.9024775 ]]]])
Coordinates:

x            (x) float64 7kB 5.855 5.865 5.875 5.885 ... 15.0 15.01 15.02
y            (y) float64 6kB 55.07 55.06 55.05 55.04 ... 47.3 47.29 47.28
spatial_ref  int64 8B 0
Dimensions without coordinates: time, band

workflow_germany = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": "germany",
        }
    }
})

workflow_germany

2429a993-385f-546f-b4f7-97b3ba4a5adb

#Request the data from Geo Engine into a geopandas dataframe
germany = workflow_germany.get_dataframe(
    ge.QueryRectangle(
        extent,
        ge.TimeInterval(start_time, start_time),
        resolution=resolution,
        srs="EPSG:4326"
    )
)

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(24, 8))
da.plot(ax=axes[0], cmap='viridis')
germany.boundary.plot(ax=axes[0], color='orange', linewidth=1)
axes[0].set_title("Prediction")
axes[0].set_xlabel('')
axes[0].set_ylabel('')

da_log.plot(ax=axes[1], cmap='viridis')
germany.boundary.plot(ax=axes[1], color='orange', linewidth=1)
axes[1].set_title("Prediction (log)")
axes[1].set_xlabel('')
axes[1].set_ylabel('')

# Vector plot
data.plot(ax=axes[2], markersize=10, color='teal')
germany.boundary.plot(ax=axes[2], color='orange', linewidth=1)
axes[2].set_title("GBIF")
axes[2].set_xlabel('')
axes[2].set_ylabel('')

plt.show()

Updates & Changes

++ 25.10.2024 ++

Percentile estimate: We added the option to calculate a percentile estimate over a raster time series in the Temporal Raster Aggregation operator.
Band neighborhood aggregate: We added the operator Band Neighborhood Aggregation, which computes a function over neighboring bands for multiband raster data.
Fixes and UX improvements: We fixed a number of issues with existing functionality, e.g., the loading of data from the GfBio Search Basket which previously failed in certain edge cases. We also improved the UX, e.g., by adding loading indicators and setting more sensible default values in dialogs.

++ 19.06.2024 ++

Login improvements: We have improved the login experience by fixing a bug where the login state was displayed incorrectly as well as by implementing an automatic authentication refresh mechanism which keeps your login session active for longer, requiring less frequent logins.
Introduction of GBIF Time: We updated the GBIF data provider, so it now supports the time-component of the GBIF data. This way the user can benefit from spatio-temporal data, leveraging the strength of spatio-temporal visualization, analysis and transformation of the VAT.
Quantile calculations: We added the option to calculate quantiles when editing the symbology of a raster layer. This allows you to create breakpoints of gradient colorizers adjusted to the underlying data distribution in addition to creating evenly spaced breakpoints between minimum and maximum values.
Color Scales: We added many new color maps to the symbology editor, expanding the readily available options for data visualization.

VAT User Documentation