Data Fair

Version 3.88.7

Functional presentation

19 février 2024

1 - Introduction

Data Fair and its ecosystem make it possible to implement a platform for sharing data (internal or opendata) and visualizations. This platform can be intended for the general public, who can access data through adapted interactive visualizations, as well as for a more expert public who can access data through APIs.

The word FAIR refers to data which meet principles of “Findability, Accessibility, Interoperability and Reusability”. This is made possible thanks to the indexation of the data on the platform. It makes it possible to carry out complex searches in volumes of several million records and to access more easily and quickly what interests us. Access to data through standardized and documented APIs makes it possible to interface the platform with other information systems and facilitates data reusability.

illustration of the FAIR acronym

Data users access the platform through one or more data portals. They allow you to access the dataset catalog and explore it in different ways. It is possible to consult the datasets directly, whether with generic views (tables, simple maps, etc.) or more specific preconfigured visualizations. Data is disseminated through pages that present it in the form of data storytelling, making it easier for anyone to understand. Users can subscribe to notifications about updates and developers can access interactive documentation for various platform APIs. The portals can be embellished with content pages presenting the approaches, contributors or reuses put forward, for example.

Administrators and data contributors have access to a back-office which allows them to manage the various elements of the platform: user accounts, datasets and visualizations. Administrators can set up the environment and manage access permissions to data and visualizations. According to their profile, back-office users will be able to create, edit, enrich, delete datasets, maps and graphs. The back-office makes it possible to create data portals (internal or open data) and also to access various portal usage metrics.

General operation

Datasets are generally created by users by loading tabular or geographic files: the service stores the file, analyzes it and derives a data schema. The data is then indexed according to this scheme and can be queried through its own Web API.

In addition to file-based datasets, Data Fair also allows the creation of editable by form datasets and virtual datasets which are configurable views of one or more datasets.

The user can semanticize the fields of the datasets by attributing concepts to them, for example by determining that a column containing data on 5 digits is a field of the Postal Code type. This semantization allows 2 things: the data can be enriched from reference data or itself become reference data to enrich others, and the data can be used in visualizations adapted to their concepts.

Visualizations help unlock the full potential of user data. A few examples: a dataset containing commune codes can be projected onto a map of the French administrative division, a dataset containing parcel codes can be projected onto the cadastre, etc.

Main advantages of the platform

Data Fair makes it possible to set up an organization centered around data:

Ability to load data in different file formats or by entering via form, even allowing crowd sourcing
Consultation of data through a wide choice of interactive visualizations (graphs, maps, search engines, ...)
Possibility to create several portals according to the use cases (open data, internal exchanges, ...)
Easy creation of data APIs and enrichment of data to give it even more value
Implementation of periodic processing to automatically feed the platform with data
Secure framework, open source code and use of standards

2 - Data portal

2.1 - Home page

2.2 - Data catalog

2.3 - Dataset details

2.4 - Visualizations catalog

2.5 - Data Visualization

2.6 - Content Pages

2.7 - User Account

2.8 - Portal notification

2.9 - Reuses

2.10 - API Acces

3 - Back Office

3.3 - Configure a data portal

Portal settings

Data Fair allows you to configure several data portals which are places for publishing datasets, visualizations and content pages that users will be able to consult. There may be portals for different use cases: open data, internal data sharing, sharing with partners, data being consolidated (pre-production portal), ...

Just like for data visualizations, the configuration of a data portal is done graphically in two stages: we work on a draft which we then publish in current version. The current version is the version that is presented to the various visitors to your portal, which allows you to update the portal without impacting users until the draft has been validated. No knowledge of HTML or CSS is required and a portal is administered like a CMS like Wordpress.

screenshot of a portail configuration edition

Many elements are configurable: the logo, the home image, the favicon, the color of the navigation bar, the color of the footer, content elements (title, description, visualization to put forward, public or private visibility) and various communication elements (website, contact email, and accounts on social networks).

It is possible to enter a Google Analytics or Matomo (ex Piwik) account for activity monitoring and thus have statistics on the most visited pages and the most downloaded data.

Editing content pages

The content pages are of different types: articles, thematic pages around several datasets, news pages, data storytelling, licenses, terms of use, ... It is thus possible to highlight data and give them even more context, for example, or to make dashboards integrating different data.

screenshot of the edition of a page of a portal

The creation of a page is done in 3 steps: We first choose the page template. Then we fill in the different elements using a form adapted to the chosen page model with a preview of the result. We can finally publish the page, which allows you to prepare pages in advance and publish them later. In addition to entering free text, it is possible to integrate different types of elements: table of a dataset, visualization, list of datasets, integration of external content, ...

To access the content pages created, it is possible to enter links in the navigation bar. Links can appear directly in the bar, or in a menu added to it. It is possible to create public pages or private pages.

3.4 - User Management

Actions	User	Contributor	Admin
Add a dataset		x	x
Read a dataset	x	x	x
Edit a datase		x	x
Administration of a dataset			x
Add a visualization		x	x
Read a visualization	x	x	x
Edit a visualization		x	x
Administration of a visualization			x
Acces and Change Settings			x
Create and modify the portal			x

3.5 - Periodic processings

3.6 - Catalog connectors

3.7 - Usage metrics

There are two modules to track the use of the platform. The first is analytics and corresponds to the monitoring of user journeys on the data portal. This allows you to see which pages are consulted, where the users come from, the time they spend on the pages, ... The second corresponds to measurements of API consumption and allows you to see how the platform is used by other information systems or external sites.

Analytics

It is possible to use Matomo Analytics (formerly Piwik) or Google Analytics as a tracking system. This is done simply by configuring the data portal by filling in a few fields in a form.

Matomo Analytics

The configuration is done with the url of the tracker and the id of your site. The statistics under Matomo Analytics are available in different forms: tables, graphs and maps. By selecting the different representations of statistics, it is possible to customize its dashboards. It is also possible to anonymise data and record user paths while complying with the recommendations of the CNIL.

screenshot Matomo

Google Analytics

The configuration is done using the ID number. The statistics under Google Analytics are also available in different forms: tables, graphs and maps. It is also possible to customize its dashboards.

screenshot of Google analytics

APIs

Data Fair and the various associated services make extensive use of cache mechanisms to improve access times to resources, the precise statistics of use of the various access points of the platform can only be collected by a service associated with the platform's reverse-proxy.

Regarding the compliance with the GDPR, the data collected is anonymized and aggregated on a daily basis. You can access statistics for each dataset: number of API calls and number of downloads. The metrics are aggregated by user groups (owner organization, external authenticated users, anonymous, ...) or by call domain. Key figures are presented for the period requested, with a comparison to the previous period, which makes it possible to see whether the use of certain data is increasing or decreasing.

screenshot of the API metrics dashboard