Alessandro Martinello

I am currently Head of Data and Analytics at Realkredit Danmark. I have been in both academia and policy after obtaining my PhD, and in this website you can also find some of the papers I have worked on. While time is limited today for side projects, every once in a while I manage to create something useful enough to share via GitHub.

Home

Using DST's API with python

Published Feb 24, 2022

I am a huge fan of Denmark Statistics. Their Statistikbanken contains a wealth of data on the Danish society, economy, and population.

Not only all these data are publicy available, but DST has for years also provided access to all their published data tables through an API, documented here. The API access makes it extremely easy to access and use data. Yet unless one has already some experience in accessing APIs, using it might be complex for an occasional student or analyst.

This notebook provides a quick guide on how to access data from DST’s Statsbanken through their API, and presents a utility class I wrote to more easily access data tables for analytical purposes.

The only explicit dependency of that utility is pandas, which is anyway an extremely widespread package.

Both notebook and class can be found at this GitHub repository.

# Start by importing necessary packages
import requests
import pandas as pd
from IPython.display import display
from io import StringIO

from dstapi import DstApi # The helper class

DST’s API has four separate function calls to programmatically navigate around the published tables. This guide assumes that the analysist has scouted Statistikbanken already, and has identified the one or two tables from which data should be extracted. For these purposes, we only need two function calls: tableinfo and data.

  • data, as the name implies, is the function that allows us to get data from a specific table
  • tableinfo provides all the necessary metadata of a specific table. This latter function is crucial and extremely useful, as it allows to know how a table is structured, and which filters to pass to a data call.

The standard process is to begin by obtaining the necessary information from tableinfo, and then construct the call to pass to data.

This guide will proceed by for each step of the process first showing how to do it by directly using requests (and pandas), and second showing how the utility class DstApi can facilitate the process.

Step 1: Understand what a table has to offer and how it is structured

Our primary example will be DST’s table METROX1, which reports an index measuring the weekly amount of passengers travelling by metro in Copenhagen. This index was developed to measure the population’s response to the COVID pandemic. The table is small and simple, allowing for quick experimentation.

As we know the table’s name/id we can start by accessing the API directly through the python package requests, and ask about the table’s metadata (tableinfo). .

An API call is composed by a main web address, a function call, and a set of parameters. The main web address is https://api.statbank.dk/v1. The function call in this case is tableinfo. The set of necessary parameters, per documentation, is the id of the table and the format in which we’d like to receive the information. We’ll pick "metrox1" for the first (note that the table-id parameter is case-sensitive), and "JSON" for the second.

The API at DST can be called through both requests.get() and requests.post(). DST’s documentation recommends using post, because as the number and complexity of parameters grows (with some of them containing non-standard Danish characters) it’s harder to embed them in an URL. However, as the call to tableinfo is simple, below I provide examples of using both methods.

Note that the .json() method of the request.Response object serves to return the response content (which we requested in JSON format) rather than the object itself. That’s just to print out the output in the notebook.

This function returns a wealth of information. Not just the table id and description, but also the contact of the statistics responsible, and, crucially, names and values of the variables defining the table. In this case SÆSON and Tid.

# Directly embed parameters in the URL with response.get()
requests.get('https://api.statbank.dk/v1' + '/tableinfo' + "?id=metrox1&format=JSON").json()

# Pass a dictionary of parameters to requests.get()
params = {'id': 'metrox1', 'format': 'JSON'}
requests.get('https://api.statbank.dk/v1' + '/tableinfo', params=params).json()

# Use response.post() - note the change in the name of the parameter about the table's name
# I'm also adding here a language parameter - most tables are available in both Danish and English
params = {'table': 'metrox1', 'format': 'JSON', 'lang':'en'}
requests.post('https://api.statbank.dk/v1' + '/tableinfo', json=params).json()
{'id': 'METROX1',
 'text': 'Workday passenger index in the Copenhagen Metro (experimental statistics)',
 'description': 'Workday passenger index in the Copenhagen Metro (experimental statistics) by seasonal adjustment and time',
 'unit': 'Index',
 'suppressedDataValue': '0',
 'updated': '2021-12-23T08:00:00',
 'active': True,
 'contacts': [{'name': 'Peter Ottosen',
   'phone': '+4539173025',
   'mail': 'pot@dst.dk'}],
 'documentation': None,
 'footnote': {'text': 'Data are indexed against an average of week 8 og 9 , 2020. During week 4 and 5, 2020 the City Line, M3 was closed.',
  'mandatory': False},
 'variables': [{'id': 'SÆSON',
   'text': 'seasonal adjustment',
   'elimination': False,
   'time': False,
   'values': [{'id': '10', 'text': 'Seasonally adjusted'},
    {'id': '11', 'text': 'Non-seasonally adjusted'}]},
  {'id': 'Tid',
   'text': 'time',
   'elimination': False,
   'time': True,
   'values': [{'id': '2020U01', 'text': '2020U01'},
    {'id': '2020U08', 'text': '2020U08'},
    {'id': '2020U09', 'text': '2020U09'},
    {'id': '2020U10', 'text': '2020U10'},
    {'id': '2020U11', 'text': '2020U11'},
    {'id': '2020U12', 'text': '2020U12'},
    {'id': '2020U13', 'text': '2020U13'},
    {'id': '2020U14', 'text': '2020U14'},
    {'id': '2020U15', 'text': '2020U15'},
    {'id': '2020U16', 'text': '2020U16'},
    {'id': '2020U17', 'text': '2020U17'},
    {'id': '2020U18', 'text': '2020U18'},
    {'id': '2020U19', 'text': '2020U19'},
    {'id': '2020U20', 'text': '2020U20'},
    {'id': '2020U21', 'text': '2020U21'},
    {'id': '2020U22', 'text': '2020U22'},
    {'id': '2020U23', 'text': '2020U23'},
    {'id': '2020U24', 'text': '2020U24'},
    {'id': '2020U25', 'text': '2020U25'},
    {'id': '2020U26', 'text': '2020U26'},
    {'id': '2020U27', 'text': '2020U27'},
    {'id': '2020U28', 'text': '2020U28'},
    {'id': '2020U29', 'text': '2020U29'},
    {'id': '2020U30', 'text': '2020U30'},
    {'id': '2020U31', 'text': '2020U31'},
    {'id': '2020U32', 'text': '2020U32'},
    {'id': '2020U33', 'text': '2020U33'},
    {'id': '2020U34', 'text': '2020U34'},
    {'id': '2020U35', 'text': '2020U35'},
    {'id': '2020U36', 'text': '2020U36'},
    {'id': '2020U37', 'text': '2020U37'},
    {'id': '2020U38', 'text': '2020U38'},
    {'id': '2020U39', 'text': '2020U39'},
    {'id': '2020U40', 'text': '2020U40'},
    {'id': '2020U41', 'text': '2020U41'},
    {'id': '2020U42', 'text': '2020U42'},
    {'id': '2020U43', 'text': '2020U43'},
    {'id': '2020U44', 'text': '2020U44'},
    {'id': '2020U45', 'text': '2020U45'},
    {'id': '2020U46', 'text': '2020U46'},
    {'id': '2020U47', 'text': '2020U47'},
    {'id': '2020U48', 'text': '2020U48'},
    {'id': '2020U49', 'text': '2020U49'},
    {'id': '2020U50', 'text': '2020U50'},
    {'id': '2020U51', 'text': '2020U51'},
    {'id': '2020U52', 'text': '2020U52'},
    {'id': '2020U53', 'text': '2020U53'},
    {'id': '2021U01', 'text': '2021U01'},
    {'id': '2021U02', 'text': '2021U02'},
    {'id': '2021U03', 'text': '2021U03'},
    {'id': '2021U04', 'text': '2021U04'},
    {'id': '2021U05', 'text': '2021U05'},
    {'id': '2021U06', 'text': '2021U06'},
    {'id': '2021U07', 'text': '2021U07'},
    {'id': '2021U08', 'text': '2021U08'},
    {'id': '2021U09', 'text': '2021U09'},
    {'id': '2021U10', 'text': '2021U10'},
    {'id': '2021U11', 'text': '2021U11'},
    {'id': '2021U12', 'text': '2021U12'},
    {'id': '2021U13', 'text': '2021U13'},
    {'id': '2021U14', 'text': '2021U14'},
    {'id': '2021U15', 'text': '2021U15'},
    {'id': '2021U16', 'text': '2021U16'},
    {'id': '2021U17', 'text': '2021U17'},
    {'id': '2021U18', 'text': '2021U18'},
    {'id': '2021U19', 'text': '2021U19'},
    {'id': '2021U20', 'text': '2021U20'},
    {'id': '2021U21', 'text': '2021U21'},
    {'id': '2021U22', 'text': '2021U22'},
    {'id': '2021U23', 'text': '2021U23'},
    {'id': '2021U24', 'text': '2021U24'},
    {'id': '2021U25', 'text': '2021U25'},
    {'id': '2021U26', 'text': '2021U26'},
    {'id': '2021U27', 'text': '2021U27'},
    {'id': '2021U28', 'text': '2021U28'},
    {'id': '2021U29', 'text': '2021U29'},
    {'id': '2021U30', 'text': '2021U30'},
    {'id': '2021U31', 'text': '2021U31'},
    {'id': '2021U32', 'text': '2021U32'},
    {'id': '2021U33', 'text': '2021U33'},
    {'id': '2021U34', 'text': '2021U34'},
    {'id': '2021U35', 'text': '2021U35'},
    {'id': '2021U36', 'text': '2021U36'},
    {'id': '2021U37', 'text': '2021U37'},
    {'id': '2021U38', 'text': '2021U38'},
    {'id': '2021U39', 'text': '2021U39'},
    {'id': '2021U40', 'text': '2021U40'},
    {'id': '2021U41', 'text': '2021U41'},
    {'id': '2021U42', 'text': '2021U42'},
    {'id': '2021U43', 'text': '2021U43'},
    {'id': '2021U44', 'text': '2021U44'},
    {'id': '2021U45', 'text': '2021U45'},
    {'id': '2021U46', 'text': '2021U46'},
    {'id': '2021U47', 'text': '2021U47'},
    {'id': '2021U48', 'text': '2021U48'},
    {'id': '2021U49', 'text': '2021U49'}]}]}

This wealth of information is already fantastic. In that metadata there’s pretty much anything you need to figure out if you can actually use the table, and eventually how you want to select the data (seasonally ajusted? For 2020 only?). Yet that JSON file might be tough to digest, especially for more complex tables. Those cases might require preprocessing and a different type of visualization. That’s where the DstApi helper class proves useful to me. It has two methods for examining metadata.

The first one, tablesummary, summarizes the main metadata information:

  • The id and description of the table
  • The last update time
  • A table with the main available cuts of the data. Each row corresponds to a variable against which we can select, with examples of variable values and labels
# Initialize the class with the target table
metro = DstApi('METROX1')

# Get the table summary
metro.tablesummary(language='en')
Table METROX1: Workday passenger index in the Copenhagen Metro (experimental statistics) by seasonal adjustment and time
Last update: 2021-12-23T08:00:00
variable name # values First value First value label Last value Last value label Time variable
0 SÆSON 2 10 Seasonally adjusted 11 Non-seasonally adjusted False
1 Tid 96 2020U01 2020U01 2021U49 2021U49 True

The second method variable_levels zooms into a specific variable and returns a dataframe for each potential variable value. For example, we could check each value of SÆSON

metro.variable_levels('SÆSON', language='en')
id text
0 10 Seasonally adjusted
1 11 Non-seasonally adjusted

Now, we already knew these values for this simple table from tablesummary(), as they are only two. But for more complex tables, this method is very handy. Take for example DNKAPK (hey, by the way, do you know this data comes from Nationalbanken?).

If I wanted for example to extract only data about international direct investments it would be hard to know I should be referring to the value 200 in advance.

dnkapk = DstApi('dnkapk')
display(dnkapk.tablesummary(language='en'))
display(dnkapk.variable_levels('INSTRUMENT', language='en'))
Table DNKAPK: Denmarks international investment position compressed by balance, item,  domestic sector, currency, country and time
Last update: 2021-12-23T08:00:00
variable name # values First value First value label Last value Last value label Time variable
0 BALANC 3 A Assets N Net assets False
1 INSTRUMENT 9 100 1.0: Total financial items 600 6.0: Reserve assets False
2 INDSEK 7 1000 1000: All domestic sectors 1400 1400: Households, etc. False
3 VALUTA 5 Z01 All currencies USD US dollar False
4 LAND 3 Z9 All countries excluding Denmark I7 Euro area-19 False
5 Tid 67 2005K1 2005Q1 2021K3 2021Q3 True
id text
0 100 1.0: Total financial items
1 200 2.0: Direct investments
2 300 3.0: Portfolio investments
3 310 3.1: Shares
4 320 3.2: Investment fund shares
5 330 3.3: Bonds, etc.
6 400 4.0: Financial derivatives (net balance)
7 500 5.0: Other investments (loans, trade credits, ...
8 600 6.0: Reserve assets

Step 2: Get the data you need

The first step is essential for designing this second step. First and foremost because we need that information to design the call to data. Second, to make sure we only get out the data we need. Asking for too much data only to then having to throw half of it out locally is wasteful, and ultimately disrespectful for the resources invested into allowing anyone to fire up an API call (I mean how amazing is that?).

As for the first step, we’ll start by doing it manually. Here I’ll rely exclusively on request.post() as recommended by DST.

To select the query parameters to pass to the data function appropriatedly one ought to have a careful look at the DATA section in the documentation. Nonetheless, hopefully the examples below will serve to clarify how to construct such calls.

The first two key parameters are, as before, the table name and the format in which we’d like to obtain the data. In the examples below I choose BULK, which has the advantage of being faster and allowing an unlimited number of data rows at export. There are some limitations with this format, such as the inability to perform simple computations (e.g. sums) on the fly. If you need these utilities, you probably don’t need this guide, so I’ll stick with BULK here.

The third crucial parameter is the selection based on the variables shown in e.g. DstApi.tablesummary(). These are mandatory: we need to specify the selection we want to do. We might however choose to include a range of possible values, or all of them, in a selection. In this case, the character * acts as a joker. So to select all values of a variable, we can use *. To select all 2020 weeks in Tid, we could use 2020*.

Below I write the parameters necessary to download the seasonally adjusted (code 10) index for all weeks in the data, and pass them to requests.post(). Finally I print the first 200 characters of the data we received back (in ;-separated format).

params = {
    'table': 'metrox1',
    'format': 'BULK',
    'variables': [
        {'code': 'SÆSON', 'values': ['10']},
        {'code': 'Tid', 'values': ['*']}
    ]
}
r = requests.post('https://api.statbank.dk/v1' + '/data', json=params)
print(r.text[:200])
SÆSON;TID;INDHOLD
Sæsonkorrigeret;2020U01;37,7
Sæsonkorrigeret;2020U08;105,0
Sæsonkorrigeret;2020U09;95,2
Sæsonkorrigeret;2020U10;93,0
Sæsonkorrigeret;2020U11;63,0
Sæsonkorrigeret;2020U12;17,9

Neat! We can then save this data to a csv file or whatever, or directly import it into pandas:

pd.read_table(StringIO(r.text), sep=';').head()
SÆSON TID INDHOLD
0 Sæsonkorrigeret 2020U01 37,7
1 Sæsonkorrigeret 2020U08 105,0
2 Sæsonkorrigeret 2020U09 95,2
3 Sæsonkorrigeret 2020U10 93,0
4 Sæsonkorrigeret 2020U11 63,0

From here on, it should be easy. Keep in mind that you can also specify intervals for time variables, as in the example below, where I also require the data to be exported in English.

params = {
    'table': 'metrox1',
    'format': 'BULK',
    'lang': 'en',
    'variables': [
        {'code': 'SÆSON', 'values': ['11']},
        {'code': 'Tid', 'values': ['>2020U45<=2020U52']}
    ]
}
df = pd.read_csv(
    StringIO(
        requests.post('https://api.statbank.dk/v1' + '/data', json=params).text
    ), sep=';'
)

The code above is already quite compact, but to avoid remembering how to import the data into pandas all the time, DstApi has a method to import the data directly into pandas given a parameter dictionary. So, for example, given the params dictionary defined above, we might call directly

metro.get_data(params=params)
SÆSON TID INDHOLD
0 Non-seasonally adjusted 2020U46 56.2
1 Non-seasonally adjusted 2020U47 55.5
2 Non-seasonally adjusted 2020U48 58.3
3 Non-seasonally adjusted 2020U49 57.6
4 Non-seasonally adjusted 2020U50 46.9
5 Non-seasonally adjusted 2020U51 40.6
6 Non-seasonally adjusted 2020U52 33.2

the .get_data() method has also the built-in option of downloading an entire data table by not passing any parameter dictionary. As mentioned above, this might be (very) wasteful. Some DST tables contain billions of data points. That’s why when used in this way the method asks for explicit confirmation before proceeding:

metro.get_data(language='en')
/tmp/ipykernel_706/4141372998.py:1: UserWarning: API call parameters are not specified. Parameters resulting in the download of the entire table will be automatically generated. This can result in massive data downloads.
  metro.get_data(language='en')


Continue (Y/Yes)? y
SÆSON TID INDHOLD
0 Seasonally adjusted 2020U01 37.7
1 Non-seasonally adjusted 2020U01 0.0
2 Seasonally adjusted 2020U08 105.0
3 Non-seasonally adjusted 2020U08 102.9
4 Seasonally adjusted 2020U09 95.2
... ... ... ...
187 Non-seasonally adjusted 2021U47 92.8
188 Seasonally adjusted 2021U48 96.4
189 Non-seasonally adjusted 2021U48 100.0
190 Seasonally adjusted 2021U49 90.2
191 Non-seasonally adjusted 2021U49 95.3

192 rows × 3 columns

The magic here happens because DstApi has a helper method that automatically constructs a dictionary of parameters for exporting an entire table, DstApi._define_base_params().

In practice, I often end up using this function to help me define a basic dictionary of parameters, and then adjust it according to my needs, as in the example below.

# Start by constructing a basic dictionary
params = metro._define_base_params(language = 'en')
params
{'table': 'metrox1',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'SÆSON', 'values': ['*']},
  {'code': 'Tid', 'values': ['*']}]}
# Adjust the parameters of variables according to my needs
params['variables'][0]['values'] = ['10']
params['variables'][1]['values'] = ['>2020U45<=2020U52']
params
{'table': 'metrox1',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'SÆSON', 'values': ['10']},
  {'code': 'Tid', 'values': ['>2020U45<=2020U52']}]}
# Get the data
metro.get_data(params=params)
SÆSON TID INDHOLD
0 Seasonally adjusted 2020U46 59.0
1 Seasonally adjusted 2020U47 54.0
2 Seasonally adjusted 2020U48 56.2
3 Seasonally adjusted 2020U49 54.5
4 Seasonally adjusted 2020U50 44.3
5 Seasonally adjusted 2020U51 40.2
6 Seasonally adjusted 2020U52 44.8

And that’s it!

I hope this guide was useful, and that the DstApi class can prove as helpful to you as it is for me.

Once again, let me conclude with a shout out to Denmark Statistics, a real national treasure. Thanks for all your work in gathering, organizing, and publishing data for everyone to use. It’s a fantastic service, and one for which you’ll never be thanked enough.