Swiss state financial statements in RDF Data Cube


#1

Introduction

This is a work-in-progress data set about spending (Staatsrechnung in German) by the federal state of Switzerland. It starts with its creation in 1848. The data was scanned, digitized and transformed to a relational database. A web interface to this database can be found here. From the Swiss Open Data portal:

In the state financial statements, the Federal Council details the receipts of the federal administration for the previous year and accounts for its expenditure. The Federal Finance Administration (FFA) consolidates the annual financial statements of the departments and, based on them, compiles the report on the state financial statements.

I’ve transformed the data to RDF using R2RML & Stardog as engine. The pipeline itself is currently not public, this will change soon. The relational database used is relatively strange so the RDF transformation progress has to do some cleanup of the data. Also not everything is captured well semantically so we are currently trying to figure out what else we have to adjust. We decided to to that in public, as students of the University of Bern will use this data to do some visualizations.

RDF Data Cube

At the moment the data contains of pretty minimal RDF Data Cube like Observations. That means there is little to no metadata available that describes the RDF Data Cube in more details. However, this should be enough to start basic querying. In case you are completely new to SPARQL have a look at our presentation, including the “SPARQL in 11 minutes” video. Also check out my own video about querying RDF Data Cubes.

Technical Setup

Prefixes used in all SPARQL queries:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX ex: <http://example.org/>
PREFIX bb: <https://finance.ld.admin.ch/bundesbudget/>
PREFIX bb-dimension: <https://finance.ld.admin.ch/bundesbudget/dimension/> 

Make sure to copy/paste them everytime you execute a query.

Cube Setup

There are currently the following dimensions referenced:

  • bb-dimension:year: the year in xsd:gYear format, which means 4 digits
  • bb-dimension:comparison: either credit supplement, bill, or estimate
  • bb-dimension:domain: A curated list of spending categories of the government. At the moment available in French or German language.
  • bb-dimension:other: All categories I found that did not fit into anything above. There might be errors as I match the other dimensions on strings so when the string slightly differs I won’t catch it. It is possible that this dimension will disappear once we’ve cleaned up everything.

We have one measure:

  • bb:measure: the money spent. I’m a bit unsure about the unit of it, it looks like it is not always the same so it could be kCHF or CHF, depending on the year. We have to investigate that and align it properly.

Note that they are not formally defined as Dimensions/Attributes/Measures yet (aka no DSD in the RDF Data Cube).

As I use skos:Concept as values for dimensions most labels I use are of type skos:prefLabel, in some cases I might also still use rdfs:label but I will have to clean it up and align it properly.

SPARQL queries

Let us start with some queries to discover the data and validate it with the original PDFs. I’m not sure if those are public, will add the link once I find them:

SELECT * WHERE {
    ?observation a qb:Observation ; # restrict results to qb:Observations only
    bb:measure ?measure ; # get the measure
    bb-dimension:domain <https://finance.ld.admin.ch/bundesbudget/3a91a8d3> ; # set the specific domain, in this example "Zündkapselverwaltung"
    bb-dimension:year ?year ; # Any year 
    bb-dimension:comparison ?comparison . # what kind of entry was it, forecast or bill?

    ?comparison skos:prefLabel ?comparisonLabel . # this is the join to get the label of the comparison

    # There can be other dimensions, let's see if we have some 
    OPTIONAL {
  	?observation bb-dimension:other ?otherUri . 
   
    	?otherUri skos:prefLabel ?otherLabel .
    
        FILTER(lang(?otherLabel) = "de") # you can use "fr" as well, no english so far
  	}

    FILTER(lang(?comparisonLabel) = "de") # you can use "fr" as well, no english so far
  
} ORDER BY ?year

This query is a good base for exploring the data and validating it with the relational version and the original PDF scan.

We might also want to know what categories are properly defined, this is done here:

SELECT * WHERE {

    # this works as well for bb:Domain, bb:Comparision and bb:RevenueExpenses
    ?concept a bb:RevenueExpenses ; 
        skos:prefLabel ?label .
  
    FILTER(lang(?label) = "de") # you can use "fr" as well, no english so far
}

as I mentioned we have a dimension bb-dimension:other, which is kind of a catch-all category right now. Let us see how many are in there:

SELECT DISTINCT ?other ?otherLabel WHERE {
  ?obs bb-dimension:other ?other .
  
  ?other skos:prefLabel ?otherLabel .
} 

So we currently seem to have around 500 “others”.

That is it for the moment, post will be updated once we figure out more.


#2

La requête suivante fournit systématiquement 2 observations avec la même valeur
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
prefix xsd: http://www.w3.org/2001/XMLSchema#
PREFIX skos: http://www.w3.org/2004/02/skos/core#
PREFIX qb: http://purl.org/linked-data/cube#
PREFIX ex: http://example.org/
PREFIX bb: https://finance.ld.admin.ch/bundesbudget/
PREFIX bb-dimension: https://finance.ld.admin.ch/bundesbudget/dimension/

SELECT DISTINCT * WHERE {
?observation a qb:Observation ; # restrict results to qb:Observations only
bb:measure ?measure ; # get the measure
bb-dimension:domain/skos:prefLabel “Zündkapselverwaltung”@de ; # set the specific domain, in this example “Zündkapselverwaltung”
bb-dimension:year ?year ; # Any year
bb-dimension:comparison/skos:prefLabel ?comparisonLabel . # this is the join to get the label of the comparison

# There can be other dimensions, let's see if we have some 
OPTIONAL {
           ?observation bb-dimension:other/skos:prefLabel ?otherLabel .

    FILTER(lang(?otherLabel) = "de"). # you can use "fr" as well, no english so far
           }

FILTER(lang(?comparisonLabel) = "de"). # you can use "fr" as well, no english so far
           FILTER(?year = "1861"^^xsd:gYear).

}

Par exemple https://finance.ld.admin.ch/bundesbudget/observation/3399462 et https://finance.ld.admin.ch/bundesbudget/observation/3297032.

La seule différence est qu’ils proviennent de 2 datasets différents : https://finance.ld.admin.ch/bundesbudget/cube/290 et https://finance.ld.admin.ch/bundesbudget/cube/282.

Est-ce une erreur?


#3

Les valeurs des mesures des cubes 220 et 221 sont des Mio CHF. Ce serait bien de convertir ces valeurs en CHF. Le contenu de ces 2 cubes pourrait être fusionné car ils représentent les variations et l’état de la dette depuis 1925 (voir document PDF)


#4

Dans le cube 277, les valeurs codées dans la dimension ‘other’ devrait être associées à une dimension “Unité d’organisation” / “Organisationseinheit” afin d’être plus générique que le terme office / Amt qui ne s’applique pas partout vraisemblablement.

Dans le cube 278, les valeurs codées dans la dimension ‘other’ devrait être associées à une dimension “Fonction” / “Funktion”.


#5

It is not an error as we have this data in the relational database, right now I simply dump everything to RDF what we have in there so the question is more why do we have that assigned to two different cubes in the RDB and what is the difference between these two cubes.


#6
  • Convert to CHF: We should then check if we can say that all amounts of, for example, 3 digits are Mio CHF, then I can convert that in the pipeline.
  • I’m not sure if I get what you mean with they could be merged, can you give an example of what could merged why?

#7

J’ai constaté que toutes les données sont à double, une fois dans un dataset “français”, une fois dans un dataset “allemand”.
Ainsi 220 va avec 287, 221 avec 288, 277 avec 285, 278 avec 286, 280 avec 283 et 282 avec 290.
On pourrait utiliser les prefLabel @de et @fr pour les dimensions et peut-être le même mécanisme pour le référencement des documents pdf.