Introduction

Welcome to the VAT Documentation

The VAT documentation is the first point of contact for VAT-related questions. Therefore, this documentation first covers the basics of the system in the Background section and then the functionality with examples in the User Guide section. This documentation also provides relevant links to find the resources to answer your questions.

vat-introduction

If you are curious or want to try the VAT yourself, you can follow this link: https://vat.gfbio.org/

The VAT System

ServiceStatus
vat.gfbio.orgProductive

Overview

The VAT system allows users to visualize geospatial data on a map in their browser and work with it interactively. Data and processing is provided by the Geo Engine backend service running in the Semantic Layer. The VAT system lists all data products available at the Geo Engine backend service as layers, which can be selected for visualization on the map. Layers can be combined and transformed interactively by constructing arbitrarily complex workflows, which themselves are visualized as new layers on the map or can be plotted, e.g., as a bar chart, right next to the map. This facilitates an interactive approach to constructing new data products and analyzing them.

VAT Screenshot

RDC Integration

The VAT system is served as a fully containerized application from the de.NBI cloud and is connected to the Geo Engine Backend service running in the Semantic Layer.

RDC Integration

Getting started

To become familiar with the VAT system, take a look at the publicly accessible instance, which has several biodiversity research related datasets available.

You can run through the following example to get a first impression on what is possible with the VAT system. In the example, you will take elephant occurrence datasets of two distinct species and combine them with a vegetation index dataset to visualize the difference in their habitats.

  • Go to vat.gfbio.org. Click on Add Data (+) -> Layers -> Elephant example. There you can find three layers: Loxodonta africana, Loxodonta cyclotis and MOD13C2 NDVI. The first two are point datasets of occurrences of two elephant species. NDVI is a vegetation index raster dataset.

  • Add all three layers to the map by clicking on them once. (Optional: Remove the Loxodonta cyclotis occurrences outside of Africa by first clicking on Add Data -> Draw Features, set type to "Polygon" and draw a polygon around Africa by clicking on the map. Then, select Operators -> Point In Polygon and select the Loxodonta cyclotis point layer and the drawn polygon. Apply the operator.)

  • Click on Operators -> Raster Vector Join to configure a raster vector join operator. The raster vector join operator attaches raster values to points.

  • Select as point input one of the two elephant occurrence datasets and as raster input the NDVI dataset. Give a descriptive name like " with NDVI". Click on "Create" to add the new layer to the map. Repeat for the second elephant occurrence dataset.

  • Click on Operators -> Histogram. Set as input one of the two new layers created by the raster vector join. Select the "MOD13C2 NDVI" attribute. Click "Create". Repeat for the other layer.

  • Now compare the two histograms you created. You should clearly see that the forest elephant occurs more often in more densely vegetated areas than the bush elephant (as expected). You can also move around/zoom in/out on the map to compare the two histograms for different regions.

User Guide

The VAT system aims at being as intuitive as possible. Whenever a deeper understanding is required though, e.g., about the specific settings of operators, links to the documentation are provided where those are explained in depth.

References

  • Authmann, C., Beilschmidt, C., Drönner, J., Mattig, M., & Seeger, B. (2015). VAT: a system for visualizing, analyzing and transforming spatial data in science. Datenbank-Spektrum, 15, 175-184.

  • Beilschmidt, C., Drönner, J., Mattig, M., & Seeger, B. (2023). Geo Engine: Workflow-driven Geospatial Portals for Data Science. Datenbank-Spektrum, 1-9.

The Geo Engine

Geo Engine

What is the Geo Engine?

Geodata, i.e. data relating to location and time, is omnipresent. The amount of data is constantly increasing. Geodata portals play a key role in the dissemination and utilization of geodata. They typically run in the cloud and users only need a browser to be able to use them. Although portals are sometimes highly specialized, there are requirements for the underlying software that are common to all portals. Data access, data processing and visualization must always be implemented. The Geo Engine provides all the components required to build geodata portals. It consists of a back end for processing and a front end with components that can be freely combined in portals.

The Geo Engine is also a geographic information system (GIS) that makes it possible to process data. Experts can use it to create workflows that generate a result from source data and processing steps. One example is linking animal observations to a temperature layer and filtering by average temperature to find animals that can cope well with the cold. Once an interesting workflow has been found, a portal can be created that can be used intuitively without prior knowledge. The Geo Engine portals go far beyond static maps. They enable interactive analyses so that the data can be freely explored. Users can also contribute their own data and merge it with the portal data after uploading it. For example, a user can upload GPS positions of a route to the portal and visualize the development of portal data along this route.

What are its components?

The Geo Engine consists of a backend, which usually runs on a server and provides data and functions for various frontends. The two front ends that belong to the Geo Engine are the web UI and the Python library. In addition, external tools can also communicate directly with the backend via standard interfaces.

The web UI enables the Geo Engine to be used in the browser. The elements of the Web UI can be combined to create various applications. The Geo Engine GIS offers the greatest flexibility, but requires a training period and specialist knowledge due to the wide range of functions. Dashboards, on the other hand, are aimed at a broader user group. They are specialized portals that focus strongly on one application and are easier to use due to predefined analyses. The Geo Engine comes with ready-made dashboards and allows you to build new dashboards from existing components.

The Python library is aimed at users with programming skills who want to process data outside of the Geo Engine. For example, it is possible to create more complex diagrams or use machine learning. In addition, the Geo Engine can also be administered via Python by activating further functionalities for data and user administration via an admin token.

The Geo Engine is based on standard software. The backend uses GDAL, PROJ and Apache Arrow, among others. The front end is based on Angular and OpenLayers. Docker containers are available for the installation and operation of the Geo Engine. There is one container image each for the backend and frontend. Together with external components such as a PostgreSQL database, these can be bundled in a pod and provided as a separate instance.

Can everyone install the Geo Engine?

The Geo Engine can be used at a very low threshold, as there are publicly accessible instances that run in the cloud and do not require installation. Examples include the GFBio VAT system at https://vat.gfbio.org and the EBV Analyzer at https://portal.geobon.org/map. In addition to these portals, which are based on the Geo Engine and offer more or less functions, there will also be a demo of the Geo Engine GIS in the future, which will be available at https://www.geoengine.io.

The Geo Engine can also be installed on your own systems and hosted yourself. It is then provided via Docker and requires certain IT expertise. Geo Engine GmbH also offers hosting and support on request.

The Geo Engine is made available under an open core license. This is a mixture of open-source and freely usable software with certain paid functions. All essential functions are available free of charge.

How does the Geo Engine differ from similar products such as MapServer, GeoServer or GeoNode?

In the world of geodata processing, there is a huge amount of software with very different focuses. MapServer and GeoServer are server software that provide geodata via web services for maps. GeoNode is a data management platform that is based on GeoServer, among other things. It enables users to create, share and publish interactive maps. The Geo Engine goes far beyond this functionality and makes it possible to create analyses in the platform itself using an operator toolbox and workflow engine. Based on these workflows, specialized dashboards and portals can then be created that are easy for users to operate.

How is the Geo Engine used in NFDI4Biodiversity?

NFDI4Biodiversity contains a great deal of geodata, i.e. data that has a spatial and temporal reference. One example is the locations of collections in a herbarium, which can have a time of discovery and GPS coordinates. It is important for the scientific community to be able to find and use this data as easily as possible. The Geo Engine can be seen as a toolbox for creating geo-applications within the framework of NFDI4Biodiversity.

In detail, there are two points of contact in NFDI4Biodiversity, namely the GFBio portal and user portals. GFBio is a sub-project that brings together data from German collections and data centers in the field of biodiversity and offers a point of contact for researchers. The Geo Engine can be accessed via the GFBio search, from which selected data can be visualized in a web GIS in the browser, where GIS stands for Geographical Information System. In addition, the Geo Engine can be used to perform GIS operations directly on the data without having to have expert knowledge or install software. One example is the linking of environmental data, e.g. temperature models, which are offered by the Geo Engine in addition to the GFBio data, with plant locations. The actually complicated work of linking two time series of different geodata is automatically taken over by the Geo Engine. The data can in turn be visualized using maps, tables or plots or downloaded for further use.

In addition to the GFBio portal, there is also a proof-of-concept in which data portals based on the Geo Engine and some special data sets from NFDI4Biodiversity were created for specific specialist communities. Here, dashboards were built on the basis of the Geo Engine that are precisely tailored to the needs of individual user groups. These then offer selected functions with intuitive, coordinated usability.

Where else is the Geo Engine used?

The Geo Engine is used in very different scenarios. In the area of data portals, it implements the connection, visualization and analysis of geodata. Specifically, it is the technological basis of the Terranova portal, which is building a digital atlas of Europe. In the GEO BON EBV Data Portal, it enables the exploration of and access to Essential Biodiversity Variables, which provide indicators for the development of global biodiversity.

In research, the Geo Engine is used to connect complex data sets, implement special algorithms and implement analysis workflows. It is used in the RESPECT project, which is investigating environmental changes in tropical mountain forests in southern Ecuador. In CropHype, it provides the basis for improving the classification of agricultural fields using new types of satellite data.

One use case from industry is the enrichment of proprietary data with publicly available data that is difficult to obtain and process. A concrete example is the calculation of vegetation indicators, a measure of how densely overgrown an area is. Here, the Geo Engine procures the necessary satellite data, calculates the vegetation and links it to the company data. The results are made available via standard interfaces so that they can be integrated into company processes.

More information

FAQ

Here, we answer frequently asked questions about the VAT System.

The VAT System is built on top of the Geo Engine. The Geo Engine is a powerful geospatial processing engine that provides a wide range of geospatial processing capabilities. The VAT System is a user-friendly interface that uses the Geo Engine, designed to make it easy for users to access and use geo data from NFDI4Biodiversity.

Get in touch

VAT is developed by the Database Research Group of the University of Marburg (head: Prof. Bernhard Seeger). The design of VAT was a joint collaboration with the Senckenberg Biodiversity and Climate Research Centre (BiK-F) (head: Prof. Thomas Hickler).

VAT ist hosted and operated by GFBio - Gesellschaft für Biologische Daten e.V. (Imprint).

VAT is built upon the Geo Engine, a cloud-ready geo-spatial data processing platform. Learn more about Geo Engine on GitHub or visit the Geo Engine website.

Contact

If you have any questions or feedback, please feel free to contact us.

Resources

Important Features

Here, we describe the most important features of the VAT system and the Geo Engine.

Operator Toolbox

VAT utilizes the Geo Engine Operator Toolbox to provide a wide range of geospatial processing capabilities. The Operator Toolbox is a powerful tool for processing and analyzing geospatial data. It provides a wide range of operators for processing raster and vector data, such as filtering, combining, and aggregating. The Operator Toolbox is designed to be user-friendly and intuitive, allowing users to easily create complex processing chains. The Operator Toolbox is also extensible, allowing users to create custom expressions to meet their specific needs.

More resources

Python Library

The Geo Engine Python Library allows users to interact programmatically with a Geo Engine backend, for instance the one offered in the Semantic Layer.It allows the management of a Geo Engine instance for administrators, for example to assign roles. Users can manage their datasets, layers and workflows and load data products into Python for further processing and analysis tasks. Having data products from the Geo Engine easily available directly in Python facilitates their use in external tools users are already working with. For example, the Geo Engine Python Library can be used in Jupyter Notebooks to construct and retrieve data products from a Geo Engine backend, taking advantage of Geo Engine's powerful geospatial processing capabilities. Then, with a data product loaded into Python, any suitable visualization tool can be used within the notebook. Furthermore, when connected to the same Geo Engine backend, a user can seemlessly switch between the Geo Engine web front end (VAT) and the Geo Engine Python Library, choosing the tool best suited for the task at hand any time.

Getting started

To become familiar with the Geo Engine Python Library, take a look at the examples in the GitHub repository. You can connect to the Geo Engine backend running in the Semantic Layer.

User Guide

In addition to the examples, which offer a good starting point, there also exists documentation for all available functionality.

Developer Guide

The source code of the Geo Engine Python Library is publicly available on GitHub.

Search Integration

The VAT system provides a search integration with the GFBio search that allows users to transfer search results directly to the VAT system.

Searching for data

Visualizable in VAT

To search for data, users can enter a search term in the search bar and press the Enter key. This will show the search results. Users can filter the search results by selecting Visualizable in VAT in the menu on the left side. This will show only the datasets that can be visualized in the VAT system.

Search baskets

Add to Basket

Users can add search results to the basket by clicking on the Basket button. This will add the data to the basket.

Transferring data to VAT

Transfer to VAT

Users can transfer the data from the basket to the VAT system by opening the search basket. This will show all datasets that are in the basket. Users can select the datasets they want to transfer to the VAT system and click on the Visualize in VAT button.

Adding layers to the map

Dialog in VAT

This will open a dialog in the VAT system where users can select the layers they want to add to the map. Users can choose the layers they want to add and if they should replace the current layers or add the new layers on top of the current layers.

Layers in VAT

The selected layers will be added to the map in the VAT system. Users can now work with the data as they would with any other data in the VAT system.

ABCD Archive Connection

GFBio's connected Data Centers provide access to a variety of data archives. The VAT system allows users to access these archives directly. This makes it easier to access data without having to download it themselves. In addition, users can map the data together with other data sources in the VAT system.

In the background, the VAT system harvests all ABCD data from the GFBio Search Index every night. Thus, updates to the ABCD data are available in the VAT system the next day.

Finding the archives

Finding the archives

To find the ABCD data, users can click on the + button in the data menu. This opens a dialog where users can select the GFBio ABCD Datasets menu item. This will show all ABCD datasets that are available in the VAT system.

Selecting data

Selecting data

Users can select the data they are interested in by clicking on the dataset. This will load all occurrences from the selected dataset into the VAT system as a new layer.

The data is displayed on the map as clustered points and can then be used like any other data in the VAT system. Zooming in will dissolve the clusters and show the individual occurrences. Users can also open the data table to see more attributes of the occurrences.

Multimedia items

Multimedia items

Some ABCD datasets contain links to multimedia items. These can be images, videos, or audio files. Users can click on the multimedia item in the data table to open it in a new dialog. For instance, when the item is an image, it will be displayed directly in the VAT system.

The data table will show at most three links for clustered occurrences. As a users, you can zoom in to see more items.

Citing the data

Citing the data

To cite the data, users can click on the Show Provenance icon in the context menu. This will open a table that shows the citation for the data.

The table has three columns: Citation, License, and URI. The Citation column contains the citation for the data. The License column contains the license under which the data is available. The URI column contains the URI to the license file.

GBIF Data & Search

Select GBIF

The VAT system provides access to a snapshot of the GBIF occurrence data. This allows users to easily access GBIF data without having to download the data themselves. In addition, they can map the data together with other data sources in the VAT system.

The GBIF snapshot contains all occurrences available at gbif.org at the time of the snapshot. The snapshot date is noted in the GBIF data provider description (see Add Data dialog above). Since data in VAT is spatio-temporal, we filter the occurrences by three conditions:

  1. They have a coordinate
  2. They have no geospatial issues
  3. They have an event time

All occurrences fulfilling these three conditions are imported into a PostgreSQL database and indexed by time and space (using the PostGIS extension), as well as family, genus and species names. To enable browsing along the taxonomic hierarchy, we additionally import GBIF's backbone taxonomy. We also retrieve the citations for all datasets through the registry API endpoint to be able to compile them according to GBIF's citation guidelines for a set of filtered occurrences.

GBIF groups

The GBIF data are made available as a data provider, which can be selected in the data menu (+). There, VAT groups the GBIF occurrences by different taxonomic ranks, e.g. family or species. Selecting such data will load all occurrence records from different datasets that fall under this taxonomic rank, e.g. all occurrences of the Genus Abedus (water bug).

GBIF search

While users can browse lists of taxonomic ranks, they can also search for specific taxa. This makes it much easier to find the data of interest when specific taxa are known. At the top of the GBIF catalog, users can find the search icon on the right-hand side. Clicking on it brings up a search bar where users can enter the name of the taxon of interest. By typing in a few letters of the taxon name, VAT will suggest possible names that can be selected if they seem appropriate. Clicking the search icon again, or pressing ENTER, will display a list of search results.

Users can also change the default search settings by clicking the options icon next to the search icon. This opens a dialog that allows users to change the default search settings, such as the search type. The Fulltext search matches the term anywhere in the name, while the Prefix search matches only the beginning of the name. In addition, users can filter their results by taxonomic rank, e.g., show only results that are of the rank Species. This can be done by first selecting one of the collections, e.g., Species datasets, and then performing the search.

At all browsing levels, the current filter, if selected, is respected also during a search. This is an additional upgrade to a previous version of the search where you could only search for family, genus, or species. Now you can, for example, filter for a specific kingdom and order beforehand and already reduce the amount of search results by combining hierarchical browsing and searching.

Video Tutorials

This chapter contains video tutorials on how to use the VAT system. These tutorials are designed to help users get started with the VAT system and to demonstrate how to use its features.

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

Introduction to VAT

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

Video

Summary

Welcome to the Introduction to VAT.

This first tutorial will introduce you to the VAT system, which can be used to easily load, transform and explore spatio-temporal datasets, such as in the context of ecological science. This tutorial will give you a tour, explain each menu and show the functionality in a simple first use case where we spatially join the minimum and maximum temperature with the GBIF occurrence data of Aeshna affinis.

Let the tour begin!

The most prominent area when opening the link https://vat.gfbio.org is the large map. Here you can visualise the spatio-temporal data. The extent of the map can be changed by dragging with the mouse or zooming with the scroll wheel.

Introduction image VAT overview

Next, in the top left-hand corner, is the layer selection menu, which allows you to view all the layers currently loaded, change the symbology or arrange the layers. You can also view the provenance, data table or download the layer.

Layer selection

In the top left hand corner you will find the GFBio Portal button which will take you back to the GFBio Search. Due to the deep integration between the VAT and the GFBio search, it is possible to load data directly from the GFBio search. This button will take you back to the GFBio search when you have finished your data exploration.

GFBio Search

Next to the GFBio button is a zoom manipulation menu. Next to the scroll wheel, the zoom level can be changed using the maximise and minimise buttons.

Zoom buttons

In the middle of the top bar is the time step selector. When viewing spatio-temporal data, you may wish to change the time by one time step. This menu can be used to move the current time step or to open the time selector, which we will see in a moment.

Time step selector

On the top bar you will find a series of icons, which we will visit next.

The first icon is the Account menu. Here you can log in with your GFBio account, which allows you to upload files or create, save or export a project. It also shows the session token, which can be used in Python to visit loaded files.

Account menu

The next menu is the data selection menu. Here you will find several data catalogues. The Data Catalogue contains datasets hosted by the GeoEngine, such as land use classification, climate information or orographic elevation maps. The Personal Catalogue contains all files and workflows, and the All Datasets Catalogue contains all hosted and uploaded datasets. Below these are the GBIF and GFBio ABCD data catalogues, which contain all datasets derived from the respective data providers. It is also possible to draw features or load a layer by inserting the workflow_id from a Python workflow.

Data selection

The cogwheel icon hides the operator selection menu. Here you will find a range of operators to manipulate, transform, merge or plot vector or raster data.

Operator selection

The plots are then displayed in the Plot Window. Here you can view the plot results and delete plots.

Plot window

The next menu is the time configuration menu. Here you can filter the spatio-temporal data. It is also possible to change the time step using the time step selector.

Time configurator

If you are logged in, the workspace settings allow you to save and load projects and change the spatial reference of your project.

Workspace settings

The last menu is the Help section. Here you will find initial information and links to the geoengine documentation, as well as further information about the VAT.

Help section including Provenance

After this brief tour, let us start with an example workflow to demonstrate the capabilities of the VAT.

First we go to the data selection menu and search for Aeshna affinis in the GBIF data catalogue. Clicking on the file loads the layer into the map.

GBIF Search

To link the occurrence data with the mean temperature, we search for the Minimum Temperature dataset in the data catalogue.

Minimum temperature search

The Minimum Temperature dataset is a spatio-temporal dataset and therefore has a spatial and temporal extent. This can be found in the metadata of the dataset.

Minimum temperature spatiotemporal extent

To adjust the time range, change the time in the time configuration menu.

Minimum temperature time configuration

We also load the Maximum Temperature dataset.

Maximum temperature search

As the visual appearence of the temperature datasets are not appealing, we change the symbology of the raster layer.

Edit symbology button

When we clicked on Edit Symbology we were taken to the Edit Symbology menu. Here we scroll down, select a different colour map such as VIRIDIS or MAGMA and click on Create colour map. Finally, we confirm the change with the Apply button at the bottom of the menu.

Edit symbology menu

After loading the data, we want to spatially join the occurrence data of Aeshna affinis with the Minimum Temperature and Maximum Temperature datasets using the raster vector join operator. For better readability it is recommended to name the datasets.

Raster Vector Join

The result is that the vector data is spatially linked to the raster data by position. Therefore, new columns are added to the vector data table containing the information.

Data table Aeshna affinis oekosystematlas

The Histogram operator can be used to visualise the distribution of occurrence data as a function of temperature.

Histogram

The graphs then show the distribution of occurrences of Aeshna affinis as a function of the minimum and maximum temperatures on 1 January 1990.

Overview Aeshna affinis final

When you are finished manipulating the data, you can download the raster data as a .tif file and the vector data as a .shp file from the layer selection menu.

Download layer button

In the menu it is also possible to display the origin, which will then appear in the data table area at the bottom of the VAT.

Show Provenance data table

This was the first introductory tour of the VAT system. If you want to learn more, you can do so by watching the videos or exploring the use cases in this documentation.

Warning The VAT system is designed primarily for data exploration. Changing the extent of the visual map will recalculate the workflow and may change the results! This must be taken into account when working scientifically with the VAT system. There is also a new window in the bottom left corner. This window must be present when working scientifically with the VAT system, as it allows reproducibility!

Tip: The layers have several options. They can be downloaded to work with the data in other systems. The layers also always have a workflow tree and the workflow_id can be copied to import the workflow directly into Python.

Canis lupus meets Felis silvestris

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

Video

Summary

Welcome to the Canis lupus meets Felis silvestris use case.

In this example the GBIF occurrence data of Canis lupus and Felis silvestris are cut to the extent of Germany and linked to the land use classification of the Ecosystematlas.

Introduction image VAT overview

To begin, we select the Data Catalogue in the top right-hand corner. Here we have several data catalogues to choose from.

Sidebar with data catalogs

In our case, we start by searching for the individual species in the GBIF data provider. The search function makes it easy to find the species, so we search for Canis lupus and load the dataset by selecting it.

Canis lupus search

For the spatial selection we also need the German borders, which we found by searching for Germany in the data catalogue.

Germany search

In order to link the occurrence data with the land use classification, it is also necessary to load the Oekosystematlas by searching for it in the personal data catalogue. The personal data catalogue contains all datasets uploaded by the user as well as a section with all datasets, which also contains datasets not listed.

Oekosystematlas search

The next step takes place in the Operators section, located in the top right-hand corner.

First we use a Point in Polygon Filter to restrict our occurrence data to Germany. For better readability it is recommended to name the datasets.

Point in Polygon Filter

Next, we join the raster data to the vector data using the Raster Vector Join Operator, which takes the occurrence data as a vector and the Ecosystem Atlas as raster data.

Raster Vector Join

The result is that the vector data is spatially linked to the raster data by position. Therefore, a new column is added to the vector data table containing the information.

Data table Canis lupus oekosystematlas

To visualise the classified data, it is recommended to use the Class Histogram operator, which translates the Ecosystem Atlas numbers into class names using the metadata.

Class histogram

The graph then shows the distribution of occurrences according to class.

Using the same procedure for Felis silvestris, it is possible to compare the occurrence of the two species.

Overview Canis lupus Felis sivestris final

Warning: The VAT system is mainly used for data exploration. Changing the extent of the visual map will recalculate the workflow and could change the results! This must be taken into account when working scientifically with the VAT system. There is also a new window in the bottom left corner. This window must be present when working scientifically with the VAT system, as it allows reproducibility!

Tip: The layers have several options. They can be downloaded to work with the data in other systems. The layers also always have a workflow tree and the workflow_id can be copied to import the workflow directly into Python.

On Dry Land

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

Video

Summary

Welcome to the Dry Land Use Case.

In this example, the GBIF occurrence data of Calopteryx splendens are clipped to the extent of Germany and merged with the land use classification from the Oekosystematlas as well as a time of average temperature provided by the WorldClim dataset.

Introduction image VAT overview

To begin, we select the Data Catalogue in the top right-hand corner. Here we have several data catalogues to choose from.

Sidebar with data catalogs

In our case we start by searching for Calopteryx splendens in the GBIF data provider. The search function makes it easy to find the species, so we can search for Calopteryx splendens and load the dataset by selecting it.

Calopteryx splendens search

For the spatial selection we also need the German border, which we found by searching for Germany in the data catalogue.

Germany search

Next, for the link between the occurrence data and the average temperature, we search for the Average Temperature dataset in the data catalogue.

Average temperature search

Caution: The Average Temperature is a spatio-temporal dataset. Always check the spatial and temporal extent in the metadata.

Average temperature spatiotemporal extent

The Average Temperature dataset covers the whole Earth and a time range from 1970/01/01 to 2000/12/31. To do this we need to change the time in the time menu at the top right.

Time menu

As the dataset does not look very attractive, we will change the colour palette of the raster data. This can be done by right-clicking on the layer and selecting Edit Symbology.

Edit symbology button

In the symbology menu, scroll down to Create colour table, select a colour map such as VIRIDIS or MAGMA, click the Create colour table button and confirm with the Apply button at the bottom of the symbology menu.

Edit symbology menu

In order to link the occurrence data with the land use classification, it is also necessary to load the Oekosystematlas by searching for it in the personal data catalogue. The personal data catalogue contains all datasets uploaded by the user as well as a section with all datasets, which also contains datasets not listed.

Oekosystematlas search

The next step takes place in the Operators section, located in the top right-hand corner.

First we use a Point in Polygon Filter to restrict our occurrence data to Germany. For better readability it is recommended to name the datasets.

Point in Polygon filter

Next, we join the raster data to the vector data using the Raster Vector Join Operator, which takes the occurrence data as a vector and the Ecosystem Atlas and Mean Temperature as raster data.

Raster Vector Join

The result is that the vector data is spatially linked to the raster data by position. Therefore, new columns are added to the vector data table containing the information.

Data table Calopteryx splendens oekosystematlas

The Histogram operator can be used to visualise the distribution of occurrence data as a function of average temperature.

Histogram

To visualise the classified data, it is recommended to use the Class Histogram operator, which translates the Ecosystem Atlas numbers into class names using the metadata.

Class histogram

The plots then show the distribution of occurrences of Calopteryx splendens as a function, firstly, of the average temperature on 1 January 2000 and, secondly, of the land-use classification of the Ecosystematlas.

Overview Calopteryx splendens final

Warning: The VAT system is designed primarily for data exploration. Changing the extent of the visual map will recalculate the workflow and could change the results! This must be taken into account when working scientifically with the VAT system. There is also a new window in the bottom left corner. This window must be present when working scientifically with the VAT system, as it allows reproducibility!

Tip: The layers have several options. They can be downloaded to work with the data in other systems. The layers also always have a workflow tree and the workflow_id can be copied to import the workflow directly into Python.

VAT 4 ML - Creating Training data for a species distribution model

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

This workflow is a contribution to the NFDI4Earth conference.

Video

The video for this use case is coming soon!

Summary

Welcome to the VAT 4 ML Use Case.

In this example we will label training data in VAT for Germany, transfer it to a Jupyter notebook using the unique workflow identifier, download the training data as a geodataframe and finally use a machine learning model to build a species distribution model.

For this use case, we will therefore use the frequency of Arnica montana occurrences from GBIF as the target variable together with weather data from CHELSA, land use classification from Ökosystematlas and topographic information as predictor variables.

Introduction image VAT overview

To begin, select the Data Catalogue in the top right-hand corner. Here we have several data catalogues to choose from.

In our case, we start by searching for the individual species in the GBIF data provider. The search function makes it easy to find the species, so we search for Arnica montana and load the dataset by selecting it.

Sidebar with data catalog. Currently the GBIF data provider is chosen and the search is opened with Arnica montana in the search field

For the weather data we taking weather information from CHELSA. Here we choose the Mean daily air temperature, Monthly moisture index and the Montly precipitation amount.

Sidebar with data catalog. Currently the data catalogue is opened with the CHELSA tab containing multiple weather layer

**Caution: The weather data is a spatio-temporal data set. Always check the spatial and temporal extent in the metadata.

The weather datasets cover the whole earth and a time range from 01/01/1981 to 01/01/2011. We need to change the time in the time menu at the top right.

Sidebar with time configuration menu, where the time can be set to address temporal boundaries of spatio-temporal data

For the spatial selection we also need the German borders, which we found by searching for Germany in the data catalogue.

Sidebar with data catalog. Currently the data catalog is selected and the search function is used to search for the German boundaries

To add topographic information to the predictor variables, we include the SRTM elevation model.

Sidebar with data catalog. Currently the data catalog is selected with the SRTM tab

Finally, we add land use classification data, which in this case is the Oekosystemaltas. It can be loaded by searching for it in the personal data catalogue. The personal data catalogue contains all the datasets that the user has uploaded, as well as a section with all datasets, which also contains datasets that are not listed.

Sidebar with data catalog. Currently Personal data catalogue is selected search function is used to find the Oekosystematlas

This gives us all the layers we need to create the training and prediction data.

An overview map is visible which contain all the added layers

We start to create the training data and prepare the prediction data by aggregating the spatio-temporal weather data. To do this, we use the Temporal Raster Aggregation operator. This allows us to aggregate temporal data by a moving window (i.e. 1 year). We use this operator for all weather data. While we choose the mean aggregation type for the temperature and the moisture index, we choose the sum aggregation type for the precipitation. For better readability it is recommended to name the datasets.

Temporal raster aggregation operator

In a second step, we spatially filter the GBIF occurrence data of Arnica montana using the Point in Polygon Filter to restrict our occurrence data to Germany.

point in polygon operator.

Finally, to create the training data, we join the prepared raster data to the vector data using the Raster Vector Join Operator, which takes the occurrence data as a vector and the other prepared raster data. This allows us to spatially join the occurrences with the value of the underlying raster cells.

Raster Vector Join operator

To create the prediction data, we then use the Raster Stacker operator to create a multi-layer raster containing all the raster data. This makes it easier to import it into Jupyter Notebook and work with it.

Raster Stacker operator

This brings us to the Arnica montana training data and the stacked prediction grid data.

Overview of the map with the training and prediction data as well as all other layer visually hidden

We now copy the Workflow ID for each layer to use in Jupyter Notebook.

Layer menu showing the options for i.e. copy the workflow id to clipboard

In Jupyter Notebook, we can use the geoengine package to first initialise the VAT API. We then import the training data workflow. We then round and group the data in Jupyter Notebook to create a frequency of Arnica montana occurrences for each predictor variable combination. The frequency is used as the target variable and the remaining columns are used for the predictor variables. We then split the dataset into training and test data and start training the RandomForestRegressor model using a GridSearchCV strategy for better results. The best resulting model has an r2 value of 0.07.

Jupyter Notebook code used to train the species distribution model

After model training we can import the prediction data workflow. The best RandomForestRegressor model is used for the final prediction

Jupyter Notebook code used to predict using the trained species distribution model

Finally, the result is plotted using the matplotlib package.

Jupyter Notebook code used to plot the result of the prediction The plot, showing three maps, two with the distribution and one with the distribution of the training data

Although the model did not show the best performance, it was possible to show how easy it is to create spatio-temporal training data for machine learning applications using the VAT and exporting the data directly into Python, where it can be used in typical formats such as geopandas GeoDataFrame or xarray DataArray.

Examples

This chapter contains examples of how to use the VAT system. The examples are written in Jupyter notebooks and are available in the examples directory. The notebooks are converted to markdown and included in the user documentation.

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

Introduction to VAT

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

Welcome to geoengine-python! This notebook is intended to show you around and explain the basics of how geoengine-python and VAT are related.

The purpose of this notebook is to demonstrate the capabilities of Geo Engine. Therefore some useful techniques will be shown:

  • Introduction to the geoengine-python package
  • Loading a dataset
  • Using operators
  • Plotting the results
  • First simple nested workflows
  • The connection between Python and VAT

When building your own nested workflow, it is recommended to build it in several steps as seen in this notebook.

Documentation about the operators and how to use them in Python can be found here: https://docs.geoengine.io/operators/intro.html

Preparation

The first thing to do is to import the geoengine-python package:

import geoengine as ge

For plotting it is currently also necessary to import Altair:

import altair as alt
#Other imports
from datetime import datetime
import matplotlib.pyplot as plt

To establish a connection with the VAT, the ge.initialise can be used together with the API URL:

ge.initialize("https://vat.gfbio.org/api")

In the case of a locally hosted instance, the link would be http://localhost:4200/api.

For a more comfortable work with the GBIF DataProvider it is possible to get the name from the root_collection:

root_collection = ge.layer_collection()
gbif_prov_id = ''
for elem in root_collection.items:
    if elem.name == 'GBIF':
        gbif_prov_id = str(elem.provider_id)
        
gbif_prov_id
'1c01dbb9-e3ab-f9a2-06f5-228ba4b6bf7a'

To load data, use operators or plot vector data, 'workflows' need to be created, as shown in Loading the dragonfly species Aeshna affinis

Load Aeshna affinis from the GBIF DataProvider

A workflow needs to be registered in the VAT or Geo Engine instance. Therefore the command ge.register_workflow followed by the command in JSON can be used:

workflow_aeshna_affinis = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": f"_:{gbif_prov_id}:`species/Aeshna affinis`",
        }
    }
})

workflow_aeshna_affinis
c7b6b25a-714d-58d1-9f53-db7bf4995a5b

Alternatively the workflow_builder can be used as shown here: TODO

The result of each registration is the workflow_id, which can be used directly in VAT to trigger the workflow. To finally load the vector data from VAT, the .get_dataframe method can be used. The method takes as parameters the search extent, a time interval, the spatial resolution and a coordinate reference system.

#Set time
start_time = datetime.strptime(
    '2010-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
end_time = datetime.strptime(
    '2011-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_aeshna_affinis.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
ax = data.plot(markersize=3)
ax.set_xlim([-180,180])
ax.set_ylim([-90,90])
(-90.0, 90.0)

output_21_1.png

The extent was chosen to make it clear that the occurrences of Aeshna affinis only occur on the Eurasian continent. Without the x- and y-limiters the plot would look different:

data.plot()
<Axes: >

output_23_1.png

In addition to vector data, raster data could also be loaded from the VAT.

Loading Minimum and Maximum temperature from the temperature collection

To load raster data again, a workflow must be registered, but this time the 'GdalSource' is used instead of the 'OgrSource':

workflow_t_min = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "RasterScaling",
        "params": {
            "slope": {
                "type": "constant",
                "value": 0.1
            },
            "offset": {
                "type": "constant",
                "value": -273.15
            },
            "outputMeasurement": {
                "type": "continuous",
                "measurement": "temperature",
                "unit": "K/10"
            },
            "scalingMode": "mulSlopeAddOffset"
        },
        "sources": {
            "raster": {
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "mean_daily_minimum_2m_air_temperature"
                        }
                    }
                }
            }
        }
    }
})

workflow_t_min
a57efb5a-7256-58b9-b9f2-9f22d9724bab

The raster data can then be requested as a `xarray.DataArray' and plotted that way:

#Request the data from Geo Engine into a xarray dataarray
data = workflow_t_min.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(1., 1.),
        srs="EPSG:4326"
    )
)

#Plot the data TODO more description
data.plot(vmin=-50, vmax=50)
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa





<matplotlib.collections.QuadMesh at 0x7fb654cef9a0>

output_29_2.png

The same can be done for the maximum temperature:

workflow_t_max = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "RasterScaling",
        "params": {
            "slope": {
                "type": "constant",
                "value": 0.1
            },
            "offset": {
                "type": "constant",
                "value": -273.15
            },
            "outputMeasurement": {
                "type": "continuous",
                "measurement": "temperature",
                "unit": "K/10"
            },
            "scalingMode": "mulSlopeAddOffset"
        },
        "sources": {
            "raster": {
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "mean_daily_maximum_2m_air_temperature"
                        }
                    }
                }
            }
        }        
    }
})

workflow_t_max
cdfe579d-b451-5b7e-b98d-bf0570489784
#Request the data from Geo Engine into a xarray dataarray
data = workflow_t_max.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(1.0, 1.0),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmin=-50, vmax=50)
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa





<matplotlib.collections.QuadMesh at 0x7fb652906ad0>

output_32_2.png

As well as loading data, the VAT has several operators for manipulating or transforming geodata. One example is the raster vector join.

Raster Vector Join between Aeshna affinis and the Minimum and Maximum Temperature

The raster vector join operator joins the vector data to one or more raster layers based on the position of the vector features. As shown in this example, the inputs are more or less the individual workflows seen before:

workflow_aeshna_affinis_join = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
            "names": {
                "type": "names",
                "values": ["Min_Temperature", "Max_Temperature"]
            },
            "temporalAggregation": "none",
            "featureAggregation": "mean",
        },
        "sources": {
            "vector": { #Aeshna affinis ##########################################
                "type": "OgrSource",
                "params": {
                    "data": f"_:{gbif_prov_id}:`species/Aeshna affinis`",
                }
            }, ###################################################################
            "rasters": [{ #Minimum temperature ###################################
                    "type": "RasterScaling",
                    "params": {
                        "slope": {
                            "type": "constant",
                            "value": 0.1
                        },
                        "offset": {
                            "type": "constant",
                            "value": -273.15
                        },
                        "outputMeasurement": {
                            "type": "continuous",
                            "measurement": "temperature",
                            "unit": "K/10"
                        },
                        "scalingMode": "mulSlopeAddOffset"
                    },
                    "sources": {
                        "raster": {
                            "type": "RasterTypeConversion",
                            "params": {
                                "outputDataType": "F32"
                            },
                            "sources": {
                                "raster": {
                                    "type": "GdalSource",
                                    "params": {
                                        "data": "mean_daily_minimum_2m_air_temperature"
                                    }
                                }
                            }
                        }
                    }
                }, ################################################################ 
                { #Maximum temperature ############################################
                    "type": "RasterScaling",
                    "params": {
                        "slope": {
                            "type": "constant",
                            "value": 0.1
                        },
                        "offset": {
                            "type": "constant",
                            "value": -273.15
                        },
                        "outputMeasurement": {
                            "type": "continuous",
                            "measurement": "temperature",
                            "unit": "K/10"
                        },
                        "scalingMode": "mulSlopeAddOffset"
                    },
                    "sources": {
                        "raster": {
                            "type": "RasterTypeConversion",
                            "params": {
                                "outputDataType": "F32"
                            },
                            "sources": {
                                "raster": {
                                    "type": "GdalSource",
                                    "params": {
                                        "data": "mean_daily_maximum_2m_air_temperature"
                                    }
                                }
                            }
                        }
                    }
                } #################################################################
            ]
        }
    }
})
    

workflow_aeshna_affinis_join
8b26f457-4d52-5f35-b10a-aca7352f47d1

The input parameters required for each operator can be found in the documentation: https://docs.geoengine.io/operators/intro.html. In this example, the RasterVectorJoin operator takes two input parameters: vector, which represents the vector layer to use, and raster, which represents the one or more raster layers to join.

The resulting vector data again can be get by requesting the data as a GeoDataFrame:

#Request the data from Geo Engine into a geopandas dataframe
data_aeshna_affinis = workflow_aeshna_affinis_join.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the geopandas dataframe
data_aeshna_affinis
geometry Max_Temperature Min_Temperature basisofrecord gbifid scientificname start end
0 POINT (6.17690 52.27207) 13.250000 4.350006 HUMAN_OBSERVATION 699741184 Aeshna affinis Vander Linden, 1820 2010-04-28 00:00:00+00:00 2010-04-28 00:00:00+00:00
1 POINT (6.17690 52.27207) 13.250000 4.350006 HUMAN_OBSERVATION 699741183 Aeshna affinis Vander Linden, 1820 2010-04-28 00:00:00+00:00 2010-04-28 00:00:00+00:00
2 POINT (3.55448 43.39541) 18.550018 13.850006 HUMAN_OBSERVATION 3945130371 Aeshna affinis Vander Linden, 1820 2010-05-26 00:00:00+00:00 2010-05-26 00:00:00+00:00
3 POINT (3.76048 49.60182) 17.550018 8.750000 HUMAN_OBSERVATION 2485531094 Aeschna affinis Stephens, 1836 2010-05-25 00:00:00+00:00 2010-05-25 00:00:00+00:00
4 POINT (3.76048 49.60182) 17.550018 8.750000 HUMAN_OBSERVATION 2485629036 Aeschna affinis Stephens, 1836 2010-05-28 00:00:00+00:00 2010-05-28 00:00:00+00:00
... ... ... ... ... ... ... ... ...
973 POINT (5.94470 46.68733) NaN NaN HUMAN_OBSERVATION 3480458996 Aeshna affinis Vander Linden, 1820 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
974 POINT (2.73627 49.70645) NaN NaN HUMAN_OBSERVATION 3845267165 Aeshna affinis Vander Linden, 1820 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
975 POINT (2.62640 49.08975) NaN NaN HUMAN_OBSERVATION 3072870148 Aeshna affinis Vander Linden, 1820 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
976 POINT (3.79175 46.02396) NaN NaN HUMAN_OBSERVATION 3072950291 Aeshna affinis Vander Linden, 1820 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
977 POINT (6.04652 46.69048) NaN NaN HUMAN_OBSERVATION 3073536260 Aeshna affinis Vander Linden, 1820 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00

978 rows × 8 columns

The data could then be plotted directly in Pyhton:

fig, ax = plt.subplots(1, 2, figsize=(20,10))

data_aeshna_affinis.plot(ax=ax[0], column='Min_Temperature', legend=True, legend_kwds={'label': 'Minimum Temperature'})
data_aeshna_affinis.plot(ax=ax[1], column='Max_Temperature', legend=True, legend_kwds={'label': 'Maximum Temperature'})

plt.show()

output_41_0.png

The VAT also offers some of its own plot types, such as histograms.

Plotting Aeshna affinis Minimum and Maximum Temperature as Histograms using VAT

Of course, a workflow must be registered in order to plot the data:

workflow_aeshna_affinis_join_plot_min = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "Histogram",
       "params": {
          "attributeName": "Min_Temperature",
           "bounds": "data",
           "buckets": {
               "type": "number",
               "value": 20
           }
       },
        "sources": {
            "source": { #Aeshna affinis Join #############################################
                "type": "RasterVectorJoin",
                "params": {
                    "names": {
                        "type": "names",
                        "values": ["Min_Temperature", "Max_Temperature"]
                    },
                    "temporalAggregation": "none",
                    "featureAggregation": "mean",
                },
                "sources": {
                    "vector": { 
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Aeshna affinis`",
                        }
                    }, 
                    "rasters": [{
                            "type": "RasterScaling",
                            "params": {
                                "slope": {
                                    "type": "constant",
                                    "value": 0.1
                                },
                                "offset": {
                                    "type": "constant",
                                    "value": -273.15
                                },
                                "outputMeasurement": {
                                    "type": "continuous",
                                    "measurement": "temperature",
                                    "unit": "K/10"
                                },
                                "scalingMode": "mulSlopeAddOffset"
                            },
                            "sources": {
                                "raster": {
                                    "type": "RasterTypeConversion",
                                    "params": {
                                        "outputDataType": "F32"
                                    },
                                    "sources": {
                                        "raster": {
                                            "type": "GdalSource",
                                            "params": {
                                                "data": "mean_daily_minimum_2m_air_temperature"
                                            }
                                        }
                                    }
                                }
                            }
                        }, 
                        {
                            "type": "RasterScaling",
                            "params": {
                                "slope": {
                                    "type": "constant",
                                    "value": 0.1
                                },
                                "offset": {
                                    "type": "constant",
                                    "value": -273.15
                                },
                                "outputMeasurement": {
                                    "type": "continuous",
                                    "measurement": "temperature",
                                    "unit": "K/10"
                                },
                                "scalingMode": "mulSlopeAddOffset"
                            },
                            "sources": {
                                "raster": {
                                    "type": "RasterTypeConversion",
                                    "params": {
                                        "outputDataType": "F32"
                                    },
                                    "sources": {
                                        "raster": {
                                            "type": "GdalSource",
                                            "params": {
                                                "data": "mean_daily_maximum_2m_air_temperature"
                                            }
                                        }
                                    }
                                }
                            }
                        } 
                    ]
                } ##########################################################################
            } 
       }
    }
})
    
workflow_aeshna_affinis_join_plot_min
8426078a-2940-5a76-8f16-afda4ed45b80

The .plot_chart method can be used to get the plot, which can then be plotted using the altair' package:

#Request the plot from Geo Engine
plot_aeshna_affinis_min = workflow_aeshna_affinis_join_plot_min.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_aeshna_affinis_min.spec)

The same can be done for the maximum temperature:

workflow_aeshna_affinis_join_plot_max = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "Histogram",
       "params": {
          "attributeName": "Max_Temperature",
           "bounds": "data",
           "buckets": {
               "type": "number",
               "value": 20
           }
       },
        "sources": {
            "source": { #Aeshna affinis Join #############################################
                "type": "RasterVectorJoin",
                "params": {
                    "names": {
                        "type": "names",
                        "values": ["Min_Temperature", "Max_Temperature"]
                    },
                    "temporalAggregation": "none",
                    "featureAggregation": "mean",
                },
                "sources": {
                    "vector": { 
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Aeshna affinis`",
                        }
                    }, 
                    "rasters": [{
                            "type": "RasterScaling",
                            "params": {
                                "slope": {
                                    "type": "constant",
                                    "value": 0.1
                                },
                                "offset": {
                                    "type": "constant",
                                    "value": -273.15
                                },
                                "outputMeasurement": {
                                    "type": "continuous",
                                    "measurement": "temperature",
                                    "unit": "K/10"
                                },
                                "scalingMode": "mulSlopeAddOffset"
                            },
                            "sources": {
                                "raster": {
                                    "type": "RasterTypeConversion",
                                    "params": {
                                        "outputDataType": "F32"
                                    },
                                    "sources": {
                                        "raster": {
                                            "type": "GdalSource",
                                            "params": {
                                                "data": "mean_daily_minimum_2m_air_temperature"
                                            }
                                        }
                                    }
                                }
                            }
                        }, 
                        {
                            "type": "RasterScaling",
                            "params": {
                                "slope": {
                                    "type": "constant",
                                    "value": 0.1
                                },
                                "offset": {
                                    "type": "constant",
                                    "value": -273.15
                                },
                                "outputMeasurement": {
                                    "type": "continuous",
                                    "measurement": "temperature",
                                    "unit": "K/10"
                                },
                                "scalingMode": "mulSlopeAddOffset"
                            },
                            "sources": {
                                "raster": {
                                    "type": "RasterTypeConversion",
                                    "params": {
                                        "outputDataType": "F32"
                                    },
                                    "sources": {
                                        "raster": {
                                            "type": "GdalSource",
                                            "params": {
                                                "data": "mean_daily_maximum_2m_air_temperature"
                                            }
                                        }
                                    }
                                }
                            }
                        } 
                    ]
                } ##########################################################################
            } 
       }
    }
})

#Request the plot from Geo Engine
plot_aeshna_affinis_max = workflow_aeshna_affinis_join_plot_max.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(-180, -90, 180, 90),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_aeshna_affinis_max.spec)

As you can see, VAT offers a lot of functionality, which will be deepened and extended in the following examples.

Further experiments

In this chapter, some other useful links between Geo Engine and Python are shown.

#Overlay plot with context
import geopandas as gpd
import matplotlib.pyplot as plt

#Request the data from Geo Engine into a xarray dataarray
data_min = workflow_t_min.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(-15.1189, 29.6655, 92.9116, 65.3164),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(1.0, 1.0),
        srs="EPSG:4326"
    )
)

#Request the data from Geo Engine into a xarray dataarray
data_max = workflow_t_max.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(-15.1189, 29.6655, 92.9116, 65.3164),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(1.0, 1.0),
        srs="EPSG:4326"
    )
)


#Plot the data
fig, ax = plt.subplots(1, 2, figsize=(20,10))

data_min.plot(ax=ax[0], vmin=-30, vmax=20)
data_aeshna_affinis.plot(ax=ax[0], color='red', markersize=3)

data_max.plot(ax=ax[1], vmin=-30, vmax=20)
data_aeshna_affinis.plot(ax=ax[1], color='red', markersize=3)

plt.show()
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa

output_53_1.png


Canis lupus meets Felis silvestris

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

This workflow uses the VAT to compare the occurrence of Canis lupus and Felis silvestris as a function of land use classification from the Ökosystematlas.

The purpose of this notebook is also to demonstrate the capabilities of Geo Engine. Therefore some useful techniques will be shown:

  • Using the GBIF data catalogue
  • Point in polygon selection
  • Raster vector join of occurrence data with land use classification
  • Plotting a class histogram
  • Nested workflows

When building your own nested workflow, it is recommended to build it in several steps as shown in this notebook.

Documentation about the operators and how to use them in Python can be found here: https://docs.geoengine.io/operators/intro.html

Preparation

#Import packages
import geoengine as ge
import geoengine_openapi_client
from datetime import datetime
from geoengine.types import RasterBandDescriptor
import altair as alt

alt.renderers.enable('default')
RendererRegistry.enable('default')
#Initialize Geo Engine in VAT
ge.initialize("https://vat.gfbio.org/api")
#Get the GBIF DataProvider id (Useful for translating the DataProvider name to its id)
root_collection = ge.layer_collection()
gbif_prov_id = ''
for elem in root_collection.items:
    if elem.name == 'GBIF':
        gbif_prov_id = str(elem.provider_id)
        
gbif_prov_id
'1c01dbb9-e3ab-f9a2-06f5-228ba4b6bf7a'

Load boundaries of Germany for later GBIF occurrence extraction (optional)

This chapter is not required and only shows that country borders are available.

#Create workflow to request German border
workflow_germany = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": "germany",
        }
    }
})

workflow_germany
2429a993-385f-546f-b4f7-97b3ba4a5adb
#Set time
start_time = datetime.strptime(
    '2000-04-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
end_time = datetime.strptime(
    '2030-04-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_germany.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()
<Axes: >

output_10_1.png

Load Ökosystematlas for later raster vector join with occurrence data (optional)

This chapter is not needed and only shows that raster data is also available.

#Create a workflow to request the oekosystematlas raster data
workflow_oekosystematlas = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "GdalSource",
        "params": {
            "data": "oekosystematlas"
        }
    }
})

workflow_oekosystematlas
8a859eeb-0778-5190-a9d1-b1f787e4176d
#Request the data from Geo Engine into a xarray dataarray
data = workflow_oekosystematlas.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmax=75)
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa





<matplotlib.collections.QuadMesh at 0x7f67d1c4ada0>

output_14_2.png

Processing Canis lupus

None of the following steps are theoretically necessary, as the entire workflow will be projected in the nested request in the end. However, the steps are intended to show the capabilities of Geo Engine.

Load Canis lupus (Optional)

#Create workflow to request Canis lupus incidents
workflow_canis_lupus = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": f"_:{gbif_prov_id}:`species/Canis lupus`",
        }
    }
})

workflow_canis_lupus.get_result_descriptor()
Data type:         MultiPoint
Spatial Reference: EPSG:4326
Columns:
  gbifid:
    Column Type: int
    Measurement: unitless
  scientificname:
    Column Type: text
    Measurement: unitless
  basisofrecord:
    Column Type: text
    Measurement: unitless
#Request the data from Geo Engine into a geopandas dataframe
data = workflow_canis_lupus.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()
<Axes: >

output_19_1.png

Point in Polygon Canis lupus

#Create workflow to request Canis lupus incidents filtered by German border
workflow_canis_lupus_cut = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "PointInPolygonFilter",
        "params": {},
        "sources": {
            "points": { #Canis lupus ###############################
                "type": "OgrSource",
                "params": {
                    "data": f"_:{gbif_prov_id}:`species/Canis lupus`",
                    "attributeProjection": []
                } 
            }, #####################################################
            "polygons": { #Germany #################################
                "type": "OgrSource",
                "params": {
                    "data": "germany"
                }
            } ######################################################
        } 
    }
})

workflow_canis_lupus_cut
f30ac841-81b0-5301-bac6-840dd914c1ba
#Request the data from Geo Engine into a geopandas dataframe
data_canis_lupus = workflow_canis_lupus_cut.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data_canis_lupus.plot()
<Axes: >

output_22_1.png

Nested Point in Polygon and Raster Vector Join Canis lupus

#Create a workflow to request Canis lupus occurrences filtered by the German border and linked to the Ökosystematlas data.
workflow_canis_lupus_cut_join = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
            "names": {
                "type": "names",
                "values": ["Ökosystematlas"]
            },  
            "temporalAggregation": "none",
            "featureAggregation": "mean",
        },
        "sources": {
            "vector": { #Canis lupus cut ######################################
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Canis lupus`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            }, ##############################################################
            "rasters": [{ #Ökosystematlas ###################################
                "type": "GdalSource",
                "params": {
                    "data": "oekosystematlas"
                }
            }] ##############################################################
        },
    }
})

workflow_canis_lupus_cut_join
2c8ebbbc-b848-58e6-8f5c-f51976db3c8f
#Request the data from Geo Engine into a geopandas dataframe
data = workflow_canis_lupus_cut_join.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    ),
    resolve_classifications=True
)

#Show the geopandas dataframe
data
geometry basisofrecord gbifid scientificname Ökosystematlas start end
0 POINT (9.49776 52.08503) HUMAN_OBSERVATION 3447336010 Canis familiaris Linnaeus, 1758 Laubwälder 2022-01-06 00:00:00+00:00 2022-01-06 00:00:00+00:00
1 POINT (8.63148 50.01629) HUMAN_OBSERVATION 1579887520 Canis familiaris Linnaeus, 1758 Verkehrsinfrastruktur 2017-03-11 00:00:00+00:00 2017-03-11 00:00:00+00:00
2 POINT (9.55500 48.97333) HUMAN_OBSERVATION 1579896270 Canis familiaris Linnaeus, 1758 Mischwälder 2017-01-01 00:00:00+00:00 2017-01-01 00:00:00+00:00
3 POINT (6.14376 50.81583) HUMAN_OBSERVATION 1883797122 Canis familiaris Linnaeus, 1758 Grünland 2018-05-14 00:00:00+00:00 2018-05-14 00:00:00+00:00
4 POINT (10.29174 48.88160) HUMAN_OBSERVATION 1891284730 Canis familiaris Linnaeus, 1758 Laubwälder 2018-08-16 00:00:00+00:00 2018-08-16 00:00:00+00:00
... ... ... ... ... ... ... ...
1336 POINT (14.90000 51.35000) HUMAN_OBSERVATION 3725545490 Canis lupus Linnaeus, 1758 Nadelwälder 2019-01-13 00:00:00+00:00 2019-01-13 00:00:00+00:00
1337 POINT (12.42115 51.19143) HUMAN_OBSERVATION 3712440633 Canis lupus Linnaeus, 1758 Siedlungsfläche mit niedriger Baudichte 2022-03-05 17:27:07+00:00 2022-03-05 17:27:07+00:00
1338 POINT (14.20000 51.45000) HUMAN_OBSERVATION 2837851869 Canis lupus Linnaeus, 1758 Siedlungsfläche mit niedriger Baudichte 2019-04-26 00:00:00+00:00 2019-04-26 00:00:00+00:00
1339 POINT (14.85000 51.35000) HUMAN_OBSERVATION 2836478160 Canis lupus Linnaeus, 1758 Ackerland 2019-01-13 00:00:00+00:00 2019-01-13 00:00:00+00:00
1340 POINT (6.51747 49.46328) HUMAN_OBSERVATION 2511463696 Canis lupus Linnaeus, 1758 Laubwälder 2014-01-01 00:00:00+00:00 2014-01-01 00:00:00+00:00

1341 rows × 7 columns

It can be seen that the Ökosystematlas variable is numerical, while the classes are human-readable encoded in the metadata of the files. This can be adjusted using a class histogram

Nested Full Workflow Canis lupus

#Create a workflow to plot Canis lupus occurrences filtered by the German border and merged with Ökosystematlas data as a class histogram.
workflow_canis_lupus_full = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "ClassHistogram",
       "params": {
          "columnName": "Ökosystematlas"
       },
        "sources": {
            "source": { #Canis lupus cut join #####################################
                "type": "RasterVectorJoin",
                "params": {
                        "names": {
                            "type": "names",
                            "values": ["Ökosystematlas"]
                        }, 
                        "temporalAggregation": "none",
                        "featureAggregation": "mean",
                },
                "sources": {
                    "vector": {
                        "type": "PointInPolygonFilter",
                        "params": {},
                        "sources": {
                            "points": {
                                "type": "OgrSource",
                                "params": {
                                    "data": f"_:{gbif_prov_id}:`species/Canis lupus`",
                                    "attributeProjection": []
                                }
                            },
                            "polygons": {
                                "type": "OgrSource",
                                "params": {
                                    "data": "germany"
                                }
                            }
                        }
                    },
                    "rasters": [{
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        }
                    }]
                }
            } ######################################################################
       }
    }
})
    
workflow_canis_lupus_full
b182c10b-59ce-5d5b-946f-fccc3ae04c88
#Request the plot from Geo Engine
plot_canis_lupus = workflow_canis_lupus_full.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_canis_lupus.spec)

Processing Felis silvestris

None of the following steps are theoretically necessary, as the entire workflow will be projected in the nested request in the end. However, the steps are intended to show the capabilities of Geo Engine.

Load Felis silvestris (Optional)

#Create workflow to request Felis silvestris occurrences
workflow_felis_silvestris = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": f"_:{gbif_prov_id}:`species/Felis silvestris`",
        }
    }
})

workflow_felis_silvestris
f8d5abd5-7d5f-567e-97a2-7830052d6cbf
#Request the data from Geo Engine into a geopandas dataframe
data = workflow_felis_silvestris.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()
<Axes: >

output_34_1.png

Point in Polygon Felis silvestris

#Create workflow to request Felis silvestris occurrences filtered by German border
workflow_felis_silvestris_cut = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "PointInPolygonFilter",
        "params": {},
        "sources": {
            "points": { #Felis silvestris ################################
                "type": "OgrSource",
                "params": {
                    "data": f"_:{gbif_prov_id}:`species/Felis silvestris`",
                    "attributeProjection": []
                }
            }, ###########################################################
            "polygons": { #Germany #######################################
                "type": "OgrSource",
                "params": {
                    "data": "germany"
                }
            } ############################################################
        } 
    }
})

workflow_felis_silvestris_cut
518c27b3-0ce7-56ac-b826-5a72be463a73
#Request the data from Geo Engine into a geopandas dataframe
data_felis_silvestris = workflow_felis_silvestris_cut.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data_felis_silvestris.plot()
<Axes: >

output_37_1.png

Nested Point in Polygon and Raster Vector Join Felis silvestris

#Create a workflow to request Felis silvestris occurrences filtered by the German border and linked to the Ökosystematlas data.
workflow_felis_silvestris_cut_join = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
                "names": {
                    "type": "names",
                    "values": ["Ökosystematlas"]
                }, 
                "temporalAggregation": "none",
                "featureAggregation": "mean",
        },
        "sources": {
            "vector": { #Felis silvestris cut #####################################
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Felis silvestris`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            }, ###################################################################
            "rasters": [{ #Ökosystematlas ########################################
                "type": "GdalSource",
                "params": {
                    "data": "oekosystematlas"
                }
            }] ###################################################################
        },
    }
})

workflow_felis_silvestris_cut_join
355b4e59-65cc-5cfe-a0b4-636f4d41beab
#Request the data from Geo Engine into a geopandas dataframe
data = workflow_felis_silvestris_cut_join.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    ),
    resolve_classifications=True
)

#Show the geopandas dataframe
data
geometry basisofrecord gbifid scientificname Ökosystematlas start end
0 POINT (8.08720 50.78140) MATERIAL_SAMPLE 3774757042 Felis silvestris Schreber, 1777 Laubwälder 2015-09-13 00:00:00+00:00 2015-09-13 00:00:00+00:00
1 POINT (6.74050 50.43160) PRESERVED_SPECIMEN 3774755207 Felis silvestris Schreber, 1777 Grünland 2017-10-11 00:00:00+00:00 2017-10-11 00:00:00+00:00
2 POINT (6.36984 50.50914) HUMAN_OBSERVATION 1828993691 Felis silvestris Schreber, 1777 Natürliche und extensiv genutzte Grünflächen 2018-02-24 00:00:00+00:00 2018-02-24 00:00:00+00:00
3 POINT (6.92310 50.62580) PRESERVED_SPECIMEN 3774754593 Felis silvestris Schreber, 1777 Ackerland 2017-11-08 00:00:00+00:00 2017-11-08 00:00:00+00:00
4 POINT (6.87770 50.42950) PRESERVED_SPECIMEN 3774753913 Felis silvestris Schreber, 1777 Nadelwälder 2003-10-14 00:00:00+00:00 2003-10-14 00:00:00+00:00
... ... ... ... ... ... ... ...
1116 POINT (6.13130 50.10320) HUMAN_OBSERVATION 3695923471 Felis silvestris Schreber, 1777 Laubwälder 2016-07-19 00:00:00+00:00 2016-07-19 00:00:00+00:00
1117 POINT (6.13130 50.10320) HUMAN_OBSERVATION 3695924066 Felis silvestris Schreber, 1777 Laubwälder 2019-01-09 00:00:00+00:00 2019-01-09 00:00:00+00:00
1118 POINT (6.13130 50.10320) HUMAN_OBSERVATION 3695924069 Felis silvestris Schreber, 1777 Laubwälder 2019-01-04 00:00:00+00:00 2019-01-04 00:00:00+00:00
1119 POINT (8.29065 50.12195) HUMAN_OBSERVATION 841588052 Felis silvestris Schreber, 1777 Nadelwälder 2013-08-06 17:51:20+00:00 2013-08-06 17:51:20+00:00
1120 POINT (6.13130 50.10320) HUMAN_OBSERVATION 3695923382 Felis silvestris Schreber, 1777 Laubwälder 2019-11-01 00:00:00+00:00 2019-11-01 00:00:00+00:00

1121 rows × 7 columns

Nested Full Workflow Felis silvestris

#Create a workflow to plot Felis silvestris occurrences filtered by the German border and merged with the Ökosystematlas data as a class histogram.
workflow_felis_silvestris_full = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "ClassHistogram",
       "params": {
          "columnName": "Ökosystematlas"
       },
        "sources": {
            "source": {
                "type": "RasterVectorJoin",
                "params": {
                        "names": {
                            "type": "names",
                            "values": ["Ökosystematlas"]
                        }, 
                        "temporalAggregation": "none",
                        "featureAggregation": "mean",
                },
                "sources": {
                    "vector": {
                        "type": "PointInPolygonFilter",
                        "params": {},
                        "sources": {
                            "points": {
                                "type": "OgrSource",
                                "params": {
                                    "data": f"_:{gbif_prov_id}:`species/Felis silvestris`",
                                    "attributeProjection": []
                                }
                            },
                            "polygons": {
                                "type": "OgrSource",
                                "params": {
                                    "data": "germany"
                                }
                            }
                        }
                    },
                    "rasters": [{
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        }
                    }]
                }
            }
       }
    }
})
    
workflow_felis_silvestris_full
db03640c-cf0e-5fe0-978c-f45a55eb5da3
#Request the plot from Geo Engine
plot_felis_silvestris = workflow_felis_silvestris_full.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_felis_silvestris.spec)

Comparison Canis lupus and Felis silvestris

#Show the plot from Canis lupus
alt.Chart.from_dict(plot_canis_lupus.spec)
#Show the plot from Felis silvestris
alt.Chart.from_dict(plot_felis_silvestris.spec)

Further experiments

In this chapter, some other useful links between Geo Engine and Python are shown.

#Comparison plots
import pandas as pd

# Convert the JSON data to pandas DataFrames
df1 = pd.DataFrame(plot_canis_lupus.spec['data']['values'])
df2 = pd.DataFrame(plot_felis_silvestris.spec['data']['values'])

df1['dataset'] = 'Canis lupus'
df2['dataset'] = 'Felis silvestris'

combined_df = pd.concat([df1, df2])

chart = alt.Chart(combined_df).mark_bar().encode(
    x=alt.X('Land Cover:N', title='Land Cover'),
    y=alt.Y('Frequency:Q', title='Frequency'),
    color=alt.Color('dataset:N', title='Dataset'),
    xOffset=alt.Color('dataset:N', title='Dataset')
).properties(width=600)

# Display the grouped barplot
chart
#Plotting of multiple species
import geopandas as gpd

gdf1 = data_canis_lupus
gdf2 = data_felis_silvestris

gdf1['dataset'] = 'Canis lupus'
gdf2['dataset'] = 'Felis silvestris'

combined_gdf = pd.concat([gdf1, gdf2])

combined_gdf.plot(column='dataset', cmap='rainbow', markersize=5, legend=True)
<Axes: >

output_50_1.png


On dry land

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more ++

This workflow uses the VAT to evaluate the distribution of Calopteryx splendens in dependence of the land use classification from the Ökosystematlas and an temporal aggregation of the average air temperature.

The purpose of this notebook is to demonstrate the capabilities of Geo Engine. Therefore some useful techniques will be shown:

  • Using the GBIF data catalogue
  • Point in polygon selection
  • Raster vector join of occurrence data with land use classification
  • Plotting a class histogram
  • Nested workflows

When building your own nested workflow, it is recommended to build it in several steps as shown in this notebook.

Documentation about the operators and how to use them in Python can be found here: https://docs.geoengine.io/operators/intro.html

Preparation

#Import packages
import geoengine as ge
import geoengine_openapi_client
from datetime import datetime
from geoengine.types import RasterBandDescriptor
import altair as alt
import asyncio
import nest_asyncio

alt.renderers.enable('default')
RendererRegistry.enable('default')
#Initialize Geo Engine in VAT
ge.initialize("https://vat.gfbio.org/api")
#Get the GBIF DataProvider id (Useful for translating the DataProvider name to its id)
root_collection = ge.layer_collection()
gbif_prov_id = ''
for elem in root_collection.items:
    if elem.name == 'GBIF':
        gbif_prov_id = str(elem.provider_id)
        
gbif_prov_id
'1c01dbb9-e3ab-f9a2-06f5-228ba4b6bf7a'

Load boundaries of Germany for later GBIF occurrence extraction (optional)

This chapter is not needed and only shows that country boundaries are available

#Create workflow to request germany boundary
workflow_germany = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": "germany",
        }
    }
})

workflow_germany
2429a993-385f-546f-b4f7-97b3ba4a5adb
#Set time
start_time = datetime.strptime(
    '2010-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
end_time = datetime.strptime(
    '2011-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")

#Request the data from Geo Engine into a geopandas dataframe
data = workflow_germany.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()
<Axes: >

output_10_1.png

Load Ökosystematlas (detailed) for later raster vector join with occurrence data (optional)

This chapter is not needed and only shows that raster data is also available.

#Create workflow to request the oekosystematlas raster data
workflow_oekosystematlas = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "GdalSource",
        "params": {
            "data": "oekosystematlas_detail"
        }
    }
})

workflow_oekosystematlas
f447601c-0ba1-57c3-9127-b0622f982231
#Request the data from Geo Engine into a xarray dataarray
data = workflow_oekosystematlas.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmax=75)
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa





<matplotlib.collections.QuadMesh at 0x7f4d09e97a60>

output_14_2.png

Load Average temperature for later raster vector join with event data (optional)

This chapter is not needed and only shows that raster data is also available.

#Create workflow to request the average temperature raster data
workflow_t_avg = ge.register_workflow({ 
    "type": "Raster",
    "operator": {
        "type": "RasterScaling",
            "params": {
                "slope": {
                    "type": "constant",
                    "value": 0.1
                },
                "offset": {
                    "type": "constant",
                    "value": -273.15
                },
                "outputMeasurement": {
                    "type": "continuous",
                    "measurement": "temperature",
                    "unit": "K/10"
                },
                "scalingMode": "mulSlopeAddOffset"
            },
            "sources": {
                "raster": {
                    "type": "RasterTypeConversion",
                    "params": {
                        "outputDataType": "F32"
                    },
                    "sources": {
                        "raster": {
                            "type": "GdalSource",
                            "params": {
                                "data": "mean_daily_air_temperature"
                            }
                        }
                    }
                }
            }
    }
})

workflow_t_avg
6393648d-6545-5435-a49e-015ba9dfa92e
#Preparing of the boundaries for the workflow raster stream
bbox = ge.QueryRectangle(
    ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
    ge.TimeInterval(start_time, end_time),
    resolution=ge.SpatialResolution(0.1, 0.1),
    srs="EPSG:4326"
)
#Request the data from Geo Engine into a xarray dataarray
data = workflow_t_avg.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmin=-3, vmax=3)
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa





<matplotlib.collections.QuadMesh at 0x7f4d09db8820>

output_19_2.png

Processing Calopteryx splendens

None of the following steps are necessary in theory, as the entire workflow will be projected in the nested request in the end. However, the steps are intended to show the capabilities of Geo Engine and how to logically build nested workflows.

Load Calopteryx splendens (Optional)

#Create workflow to request Calopteryx splendens occurences
workflow_calopteryx_splendens = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "OgrSource",
        "params": {
            "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
        }
    }
})

workflow_calopteryx_splendens.get_result_descriptor()
Data type:         MultiPoint
Spatial Reference: EPSG:4326
Columns:
  scientificname:
    Column Type: text
    Measurement: unitless
  basisofrecord:
    Column Type: text
    Measurement: unitless
  gbifid:
    Column Type: int
    Measurement: unitless
#Request the data from Geo Engine into a geopandas dataframe
data = workflow_calopteryx_splendens.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()
<Axes: >

output_24_1.png

Point in Polygon Calopteryx splendens

#Create workflow to request Calopteryx splendens occurrences filtered by German border
workflow_calopteryx_splendens_cut = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "PointInPolygonFilter",
        "params": {},
        "sources": {
            "points": { #Calopteryx splendens ###############################
                "type": "OgrSource",
                "params": {
                    "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
                    "attributeProjection": []
                } 
            }, #####################################################
            "polygons": { #Germany #################################
                "type": "OgrSource",
                "params": {
                    "data": "germany"
                }
            } ######################################################
        } 
    }
})

workflow_calopteryx_splendens_cut
6cf9ef88-8bd3-5904-bc74-f866165b18c3
#Request the data from Geo Engine into a geopandas dataframe
data_calopteryx_splendens = workflow_calopteryx_splendens_cut.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data_calopteryx_splendens.plot()
<Axes: >

output_27_1.png

Nested Point in Polygon and Raster Vector Join Calopteryx splendens

#Create a workflow to request Calopteryx splendens occurrences filtered by the German border and linked to the Ökosystematlas data.
workflow_calopteryx_splendens_cut_join = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
            "names": {
                "type": "names",
                "values": ["Ökosystematlas", "Avg_Temperature"]
            }, 
            "temporalAggregation": "none",
            "featureAggregation": "first",
        },
        "sources": {
            "vector": { #Calopteryx splendens cut ######################################
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            }, ##############################################################
            "rasters": [{ #Ökosystematlas ###################################
                "type": "GdalSource",
                "params": {
                    "data": "oekosystematlas"
                }
            }, ##############################################################
            { #Average temperature
                "type": "RasterScaling",
                    "params": {
                        "slope": {
                            "type": "constant",
                            "value": 0.1
                        },
                        "offset": {
                            "type": "constant",
                            "value": -273.15
                        },
                        "outputMeasurement": {
                            "type": "continuous",
                            "measurement": "temperature",
                            "unit": "K/10"
                        },
                        "scalingMode": "mulSlopeAddOffset"
                    },
                    "sources": {
                        "raster": {
                            "type": "RasterTypeConversion",
                            "params": {
                                "outputDataType": "F32"
                            },
                            "sources": {
                                "raster": {
                                    "type": "GdalSource",
                                    "params": {
                                        "data": "mean_daily_air_temperature"
                                    }
                                }
                            }
                        }
                    }
            }] ##############################################################
        },
    }
})

workflow_calopteryx_splendens_cut_join
63c46ba9-3efd-5ddd-b446-c36fad6537e8
#Request the data from Geo Engine into a geopandas dataframe
data = workflow_calopteryx_splendens_cut_join.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    ),
    resolve_classifications=True
)

#Show the geopandas dataframe
data
geometry Avg_Temperature basisofrecord gbifid scientificname Ökosystematlas start end
0 POINT (7.57250 51.63501) 9.350006 HUMAN_OBSERVATION 920849630 Calopteryx splendens Harris, 1780 Grünland 2010-04-29 00:00:00+00:00 2010-04-29 00:00:00+00:00
1 POINT (6.75459 52.09421) 9.250000 HUMAN_OBSERVATION 700578315 Calopteryx splendens Harris, 1780 Ackerland 2010-04-12 00:00:00+00:00 2010-04-12 00:00:00+00:00
2 POINT (6.79395 51.93967) 9.350006 HUMAN_OBSERVATION 700582646 Calopteryx splendens Harris, 1780 No data 2010-04-07 00:00:00+00:00 2010-04-07 00:00:00+00:00
3 POINT (6.75459 52.09421) 9.250000 HUMAN_OBSERVATION 700578316 Calopteryx splendens Harris, 1780 Ackerland 2010-04-12 00:00:00+00:00 2010-04-12 00:00:00+00:00
4 POINT (6.79395 51.93967) 9.350006 HUMAN_OBSERVATION 700582645 Calopteryx splendens Harris, 1780 No data 2010-04-07 00:00:00+00:00 2010-04-07 00:00:00+00:00
... ... ... ... ... ... ... ... ...
535 POINT (7.62722 47.98439) NaN HUMAN_OBSERVATION 3845932111 Calopteryx splendens Harris, 1780 Ackerland 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
536 POINT (7.79354 48.33831) NaN HUMAN_OBSERVATION 3844974542 Calopteryx splendens Harris, 1780 Ackerland 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
537 POINT (7.80175 48.42811) NaN HUMAN_OBSERVATION 3845548749 Calopteryx splendens Harris, 1780 Laubwälder 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
538 POINT (8.10648 48.77593) NaN HUMAN_OBSERVATION 3845803099 Calopteryx splendens Harris, 1780 Grünland 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
539 POINT (7.81823 48.60769) NaN HUMAN_OBSERVATION 3845383562 Calopteryx splendens Harris, 1780 No data 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00

540 rows × 8 columns

#Create a workflow to request Calopteryx splendens occurrences filtered by the German border and linked to the Ökosystematlas data.
workflow_calopteryx_splendens_cut_join = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
            "names": {
                "type": "names",
                "values": ["Ökosystematlas", "Avg_Temperature"]
            }, 
            "temporalAggregation": "none",
            "featureAggregation": "mean",
        },
        "sources": {
            "vector": { #Calopteryx splendens cut ######################################
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            }, ##############################################################
            "rasters": [{ #Ökosystematlas ###################################
                "type": "GdalSource",
                "params": {
                    "data": "oekosystematlas"
                }
            }, ##############################################################
            { #Average temperature
                "type": "RasterScaling",
                    "params": {
                        "slope": {
                            "type": "constant",
                            "value": 0.1
                        },
                        "offset": {
                            "type": "constant",
                            "value": -273.15
                        },
                        "outputMeasurement": {
                            "type": "continuous",
                            "measurement": "temperature",
                            "unit": "K/10"
                        },
                        "scalingMode": "mulSlopeAddOffset"
                    },
                    "sources": {
                        "raster": {
                            "type": "RasterTypeConversion",
                            "params": {
                                "outputDataType": "F32"
                            },
                            "sources": {
                                "raster": {
                                    "type": "GdalSource",
                                    "params": {
                                        "data": "mean_daily_air_temperature"
                                    }
                                }
                            }
                        }
                    }
            }] ##############################################################
        },
    }
})

workflow_calopteryx_splendens_cut_join
4f2e830a-9570-5c8f-b2e1-bc433814df82
#Request the data from Geo Engine into a geopandas dataframe
data = workflow_calopteryx_splendens_cut_join.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    ),
    resolve_classifications=True
)

#Show the geopandas dataframe
data
geometry Avg_Temperature basisofrecord gbifid scientificname Ökosystematlas start end
0 POINT (7.57250 51.63501) 9.350006 HUMAN_OBSERVATION 920849630 Calopteryx splendens Harris, 1780 Grünland 2010-04-29 00:00:00+00:00 2010-04-29 00:00:00+00:00
1 POINT (6.75459 52.09421) 9.250000 HUMAN_OBSERVATION 700578315 Calopteryx splendens Harris, 1780 Ackerland 2010-04-12 00:00:00+00:00 2010-04-12 00:00:00+00:00
2 POINT (6.79395 51.93967) 9.350006 HUMAN_OBSERVATION 700582646 Calopteryx splendens Harris, 1780 No data 2010-04-07 00:00:00+00:00 2010-04-07 00:00:00+00:00
3 POINT (6.75459 52.09421) 9.250000 HUMAN_OBSERVATION 700578316 Calopteryx splendens Harris, 1780 Ackerland 2010-04-12 00:00:00+00:00 2010-04-12 00:00:00+00:00
4 POINT (6.79395 51.93967) 9.350006 HUMAN_OBSERVATION 700582645 Calopteryx splendens Harris, 1780 No data 2010-04-07 00:00:00+00:00 2010-04-07 00:00:00+00:00
... ... ... ... ... ... ... ... ...
535 POINT (7.62722 47.98439) NaN HUMAN_OBSERVATION 3845932111 Calopteryx splendens Harris, 1780 Ackerland 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
536 POINT (7.79354 48.33831) NaN HUMAN_OBSERVATION 3844974542 Calopteryx splendens Harris, 1780 Ackerland 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
537 POINT (7.80175 48.42811) NaN HUMAN_OBSERVATION 3845548749 Calopteryx splendens Harris, 1780 Laubwälder 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
538 POINT (8.10648 48.77593) NaN HUMAN_OBSERVATION 3845803099 Calopteryx splendens Harris, 1780 Grünland 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00
539 POINT (7.81823 48.60769) NaN HUMAN_OBSERVATION 3845383562 Calopteryx splendens Harris, 1780 No data 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00

540 rows × 8 columns

It can be seen that the Ökosystematlas variable is numerical, while the classes are human-readable encoded in the metadata of the files. This can be adjusted using a class histogram

Nested Full Workflow Calopteryx splendens Plot Ökosystematlas

#Create a workflow to plot Calopteryx splendens occurrences filtered by the German border and merged with the ecosystematlas data as a class histogram.
workflow_calopteryx_splendens_full_öko = ge.register_workflow({
    "type": "Plot",
    "operator": {
       "type": "ClassHistogram",
       "params": {
          "columnName": "Ökosystematlas"
       },
        "sources": {
            "source": { #Calopteryx splendens cut join #####################################
                "type": "RasterVectorJoin",
                "params": {
                        "names": {
                            "type": "names",
                            "values": ["Ökosystematlas", "Avg_Temperature"]
                        }, 
                        "temporalAggregation": "none",
                        "featureAggregation": "mean",
                },
                "sources": {
                    "vector": {
                        "type": "PointInPolygonFilter",
                        "params": {},
                        "sources": {
                            "points": {
                                "type": "OgrSource",
                                "params": {
                                    "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
                                    "attributeProjection": []
                                }
                            },
                            "polygons": {
                                "type": "OgrSource",
                                "params": {
                                    "data": "germany"
                                }
                            }
                        }
                    },
                    "rasters": [{
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        }
                    },
                    {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": -273.15
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "temperature",
                                "unit": "K/10"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "mean_daily_air_temperature"
                                        }
                                    }
                                }
                            }
                        }
                    }]
                }
            } ######################################################################
       }
    }
})
    
workflow_calopteryx_splendens_full_öko
befec7cb-1b9a-5464-88b0-aa14b6be3077
#Request the plot from Geo Engine
plot_calopteryx_splendens = workflow_calopteryx_splendens_full_öko.plot_chart(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the plot
alt.Chart.from_dict(plot_calopteryx_splendens.spec)

Nested Full Workflow Calopteryx splendens Plot Average Temperature

#Create a workflow to request Calopteryx splendens occurrences filtered by the German border and linked to the Ökosystematlas data.
workflow_calopteryx_splendens_full_avg_temp = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
                "names": {
                    "type": "names",
                    "values": ["Ökosystematlas", "Avg_Temperature"]
                }, 
                "temporalAggregation": "none",
                "featureAggregation": "mean",
        },
        "sources": {
            "vector": {
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`species/Calopteryx splendens`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            },
            "rasters": [{
                "type": "GdalSource",
                "params": {
                    "data": "oekosystematlas"
                }
            },
            {
                "type": "RasterScaling",
                    "params": {
                        "slope": {
                            "type": "constant",
                            "value": 0.1
                        },
                        "offset": {
                            "type": "constant",
                            "value": -273.15
                        },
                        "outputMeasurement": {
                            "type": "continuous",
                            "measurement": "temperature",
                            "unit": "K/10"
                        },
                        "scalingMode": "mulSlopeAddOffset"
                    },
                    "sources": {
                        "raster": {
                            "type": "RasterTypeConversion",
                            "params": {
                                "outputDataType": "F32"
                            },
                            "sources": {
                                "raster": {
                                    "type": "GdalSource",
                                    "params": {
                                        "data": "mean_daily_air_temperature"
                                    }
                                }
                            }
                        }
                    }
            }]
        },
    }
})

workflow_calopteryx_splendens_full_avg_temp
4f2e830a-9570-5c8f-b2e1-bc433814df82
#Request the data from Geo Engine into a geopandas dataframe
data = workflow_calopteryx_splendens_full_avg_temp.get_dataframe(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, end_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Show the geopandas dataframe
data.plot(column='Avg_Temperature', legend=True, legend_kwds={'label': 'Average Temperature'})
<Axes: >

output_39_1.png

Further experiments

In this chapter, some other useful links between Geo Engine and Python are shown.

#Overlay plot with context
import geopandas as gpd
import matplotlib.pyplot as plt

#Request the data from Geo Engine into a xarray dataarray
data = workflow_t_avg.get_xarray(
    ge.QueryRectangle(
        ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334),
        ge.TimeInterval(start_time, start_time),
        resolution=ge.SpatialResolution(0.1, 0.1),
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot(vmin=-3, vmax=3)
data_calopteryx_splendens.plot(ax=plt.gca(), color='red', markersize=3)
plt.show()
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/owslib/coverage/wcs110.py:85: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  elem = self._capabilities.find(self.ns.OWS('ServiceProvider')) or self._capabilities.find(self.ns.OWS('ServiceProvider'))  # noqa

output_42_1.png


VAT 4 Machine Learning - Creating Training data for a species distribution model

++ Currently, the examples are being reworked after the latest update because GBIF behaves differently now. Find out more. ++

This workflow is a contribution to the NFDI4Earth conference.

This workflow is a contribution to the NFDI4Earth conference. This workflow therefore uses the frequency of Arnica montana occurrences from GBIF as a target variable together with weather data from CHELSA, land use classification from the Ökosystematlas and topographic information as predictor variables to create a species distribution model for Arnica montana across Germany.

Import

#Import Packages
import geoengine as ge
from datetime import datetime 
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import xarray as xr
import numpy as np
import asyncio
import nest_asyncio
#Initialize Geo Engine in VAT
ge.initialize("https://vat.gfbio.org/api")
#Get the GBIF DataProvider id (useful for translating the DataProvider name to its id)
root_collection = ge.layer_collection()
gbif_prov_id = ''
for elem in root_collection.items:
    if elem.name == 'GBIF':
        gbif_prov_id = str(elem.provider_id)
        
gbif_prov_id
'1c01dbb9-e3ab-f9a2-06f5-228ba4b6bf7a'

Create Labelled Data

This chapter shows how to register the workflow for generating training data and how to manipulate this data to generate training data.

#Tuning parameters
start_time = datetime.strptime('2001-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
end_time = datetime.strptime('2011-01-01T12:00:00.000Z', "%Y-%m-%dT%H:%M:%S.%f%z")
resolution = ge.SpatialResolution(0.01, 0.01)
extent = ge.BoundingBox2D(5.852490, 47.271121, 15.022059, 55.065334)

#Species selection
species = "species/Arnica montana" #Arnica
#Create a workflow to retrieve Arnica montana occurrences filtered by the German border and linked to weather, land use and topographic data.
workflow = ge.register_workflow({
    "type": "Vector",
    "operator": {
        "type": "RasterVectorJoin",
        "params": {
            "names": {
                "type": "names",
                "values": ["Ökosystematlas", "SRTM", "Mean Air Temperature", "Mean Climate Moisture Index", "Precipitation"]
            },  
            "temporalAggregation": "none",
            "featureAggregation": "first",
        },
        "sources": {
            "vector": { #Arnica montana #########################################
                "type": "PointInPolygonFilter", 
                "params": {},
                "sources": {
                    "points": {
                        "type": "OgrSource",
                        "params": {
                            "data": f"_:{gbif_prov_id}:`{species}`",
                            "attributeProjection": []
                        }
                    },
                    "polygons": {
                        "type": "OgrSource",
                        "params": {
                            "data": "germany"
                        }
                    }
                }
            }, 
            "rasters": [{ #Ökosystematlas ########################################
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        },
                    }
                }
            },
            { #SRTM #########################################################
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "srtm"
                        },
                    }
                }
                
            },
            { #Mean Annual Air Temperature ##################################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "mean",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": -273.15
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "temperature",
                                "unit": "K/10"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "mean_daily_air_temperature"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            },
            { #Mean Annual Climate moisture indices #########################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "mean",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": 0
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "climate moisture",
                                "unit": "kg m^-2 month^-1"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "monthly_climate_moisture_indicies"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            },
            { #Sum Annual Precipitation ####################################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "sum",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": 0
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "precipitation",
                                "unit": "kg m-2 month^-1"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "monthly_precipitation_amount"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }]
        },
    }
})
    
workflow
7582cfcb-3d36-5b86-bb72-e81cef584fae
#Request the data from Geo Engine into a geopandas dataframe
data = workflow.get_dataframe(
    ge.QueryRectangle(
        extent,
        ge.TimeInterval(start_time, end_time),
        resolution=resolution,
        srs="EPSG:4326"
    )
)

#Plot the data
data.plot()
<Axes: >

output_11_1.png

data
geometry Mean Air Temperature Mean Climate Moisture Index Precipitation SRTM basisofrecord gbifid scientificname Ökosystematlas start end
0 POINT (11.29000 50.47000) 6.600011 42.033333 1186.700073 684.0 HUMAN_OBSERVATION 1922039098 Arnica montana L. 12.0 2001-09-24 00:00:00+00:00 2001-09-24 00:00:00+00:00
1 POINT (10.04000 47.52000) 6.658340 108.008339 2084.500244 845.0 HUMAN_OBSERVATION 1922860404 Arnica montana L. 12.0 2001-08-21 00:00:00+00:00 2001-08-21 00:00:00+00:00
2 POINT (11.29000 50.42000) 7.016680 41.541668 1193.100098 638.0 HUMAN_OBSERVATION 1922902358 Arnica montana L. 6.0 2001-10-01 00:00:00+00:00 2001-10-01 00:00:00+00:00
3 POINT (10.04000 47.52000) 6.658340 108.008339 2084.500244 845.0 HUMAN_OBSERVATION 1922858802 Arnica montana L. 12.0 2001-07-11 00:00:00+00:00 2001-07-11 00:00:00+00:00
4 POINT (10.21000 47.37000) 3.758347 102.083336 1912.100098 1649.0 HUMAN_OBSERVATION 1926238160 Arnica montana L. 255.0 2001-07-04 00:00:00+00:00 2001-07-04 00:00:00+00:00
... ... ... ... ... ... ... ... ... ... ... ...
1551 POINT (11.18340 47.59551) 8.066678 41.883335 1305.400024 638.0 HUMAN_OBSERVATION 920659766 Arnica montana L. 2.0 2010-06-10 00:00:00+00:00 2010-06-10 00:00:00+00:00
1552 POINT (12.05000 50.11000) 7.183350 5.225000 784.700012 594.0 HUMAN_OBSERVATION 1806720955 Arnica montana L. 11.0 2010-06-22 00:00:00+00:00 2010-06-22 00:00:00+00:00
1553 POINT (13.04000 48.87000) 7.075012 46.866669 1302.099976 706.0 HUMAN_OBSERVATION 1927043392 Arnica montana L. 8.0 2010-06-16 00:00:00+00:00 2010-06-16 00:00:00+00:00
1554 POINT (12.04000 50.12000) 7.425008 0.308333 768.400024 557.0 HUMAN_OBSERVATION 1806720970 Arnica montana L. 6.0 2010-06-22 00:00:00+00:00 2010-06-22 00:00:00+00:00
1555 POINT (11.64000 50.09000) NaN NaN NaN 536.0 HUMAN_OBSERVATION 1946786537 Arnica montana L. 6.0 2011-01-01 00:00:00+00:00 2011-01-01 00:00:00+00:00

1556 rows × 11 columns

#Rounding and grouping of occurrences to create frequency along with predictor variable combination
training_data = data.round(3)
training_data = training_data.groupby(['Mean Air Temperature', 'Mean Climate Moisture Index', 'Precipitation', 'SRTM', 'Ökosystematlas']).size().reset_index(name='counts')
training_data
Mean Air Temperature Mean Climate Moisture Index Precipitation SRTM Ökosystematlas counts
0 -0.842 126.342 2321.8 2036.0 14.0 13
1 -0.717 178.900 2899.2 1938.0 16.0 3
2 0.275 153.200 2687.8 1811.0 255.0 20
3 0.858 123.850 2270.8 1798.0 14.0 1
4 0.900 109.475 2042.6 1822.0 11.0 1
... ... ... ... ... ... ...
347 9.500 -14.567 631.3 216.0 6.0 1
348 9.692 14.392 971.4 292.0 12.0 1
349 9.692 15.342 943.5 327.0 10.0 1
350 10.317 6.358 775.4 120.0 10.0 1
351 10.667 0.325 756.8 99.0 2.0 1

352 rows × 6 columns

training_data.sort_values('counts', ascending=False)
Mean Air Temperature Mean Climate Moisture Index Precipitation SRTM Ökosystematlas counts
52 5.850 128.183 2370.3 1072.0 8.0 54
24 3.858 121.725 2325.1 1071.0 14.0 43
147 7.042 33.925 1123.1 565.0 13.0 43
28 4.258 102.975 2073.7 1435.0 11.0 38
25 3.875 112.683 2174.2 1414.0 12.0 36
... ... ... ... ... ... ...
116 6.808 72.058 1677.0 875.0 8.0 1
114 6.808 18.958 948.8 668.0 8.0 1
113 6.800 68.783 1684.7 1018.0 8.0 1
232 7.617 39.742 1317.6 729.0 15.0 1
351 10.667 0.325 756.8 99.0 2.0 1

352 rows × 6 columns

Create Prediction Data

This chapter shows how to register the workflow to create prediction data.

#Create a workflow to request weather, land use and topographic data as a raster stack.
prediction_workflow = ge.register_workflow({
    "type": "Raster",
    "operator": {
          "type": "RasterStacker",
          "params": {
            "renameBands": {
              "type": "rename",
              "values": ["Ökosystematlas", "SRTM", "Mean Air Temperature", "Mean Climate Moisture Index", "Precipitation"]
            }
          },
          "sources": {
            "rasters": [{ #Ökosystematlas ########################################
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "oekosystematlas"
                        },
                    }
                }
            },
            { #SRTM #########################################################
                "type": "RasterTypeConversion",
                "params": {
                    "outputDataType": "F32"
                },
                "sources": {
                    "raster": {
                        "type": "GdalSource",
                        "params": {
                            "data": "srtm"
                        },
                    }
                }
                
            },
            { #Mean Annual Air Temperature ##################################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "mean",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": -273.15
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "temperature",
                                "unit": "K/10"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "mean_daily_air_temperature"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            },
            { #Mean Annual Climate moisture indices #########################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "mean",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": 0
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "climate moisture",
                                "unit": "kg m^-2 month^-1"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "monthly_climate_moisture_indicies"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            },
            { #Sum Annual Precipitation ####################################
                "type": "TemporalRasterAggregation",
                "params": {
                    "aggregation": {
                        "type": "sum",
                        "ignoreNoData": False
                    },
                    "window": {
                        "granularity": "years",
                        "step": 1
                    },
                    "windowReference": None,
                    "outputType": None,
                },
                "sources": {
                    "raster": {
                        "type": "RasterScaling",
                        "params": {
                            "slope": {
                                "type": "constant",
                                "value": 0.1
                            },
                            "offset": {
                                "type": "constant",
                                "value": 0
                            },
                            "outputMeasurement": {
                                "type": "continuous",
                                "measurement": "precipitation",
                                "unit": "kg m-2 month^-1"
                            },
                            "scalingMode": "mulSlopeAddOffset"
                        },
                        "sources": {
                            "raster": {
                                "type": "RasterTypeConversion",
                                "params": {
                                    "outputDataType": "F32"
                                },
                                "sources": {
                                    "raster": {
                                        "type": "GdalSource",
                                        "params": {
                                            "data": "monthly_precipitation_amount"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }]
          }
        }
    
})

prediction_workflow
370296a3-db66-599b-8e55-2a4bf362a09a
#Preparing of the boundaries for the workflow raster stream
bbox = ge.QueryRectangle(
    extent,
    ge.TimeInterval(start_time, start_time),
    resolution=resolution,
    srs="EPSG:4326"
)
nest_asyncio.apply()

async def get_prediction_data(workflow, bbox, bands=[0,1,2,3,4], clip=True):
    data = await workflow.raster_stream_into_xarray(bbox, bands=bands, clip_to_query_rectangle=clip)
    data.to_dataset(name="prediction")
    return data

async def main(extent, time, resolution, workflow):
    bbox = ge.QueryRectangle(extent, ge.TimeInterval(time, time), resolution=resolution, srs="EPSG:4326")
    return await get_prediction_data(workflow, bbox)

try:
    loop = asyncio.get_event_loop()
except RuntimeError:
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)

prediction_data = loop.run_until_complete(main(extent, start_time, resolution, prediction_workflow))
prediction_data.to_dataset(name="prediction")
/home/duempelmann/geoengine_env/lib/python3.10/site-packages/rasterio/windows.py:314: RasterioDeprecationWarning: The height, width, and precision parameters are unused, deprecated, and will be removed in 2.0.0.
  warnings.warn(
<xarray.Dataset> Size: 14MB
Dimensions:      (x: 918, y: 780, time: 1, band: 5)
Coordinates:

  • x (x) float64 7kB 5.855 5.865 5.875 5.885 ... 15.0 15.01 15.02

  • y (y) float64 6kB 55.07 55.06 55.05 55.04 ... 47.3 47.29 47.28

  • time (time) datetime64[ns] 8B 2001-01-01

  • band (band) int64 40B 0 1 2 3 4 spatial_ref int64 8B 0 Data variables: prediction (time, band, y, x) float32 14MB 21.0 21.0 ... 1.082e+03