Using DST's API with python

I am a huge fan of Denmark Statistics. Their Statistikbanken contains a wealth of data on the Danish society, economy, and population.

Not only all these data are publicy available, but DST has for years also provided access to all their published data tables through an API, documented here. The API access makes it extremely easy to access and use data. Yet unless one has already some experience in accessing APIs, using it might be complex for an occasional student or analyst.

This notebook provides a quick guide on how to access data from DST’s Statsbanken through their API, and presents a utility class I wrote to more easily access data tables for analytical purposes.

The only explicit dependency of that utility is pandas, which is anyway an extremely widespread package.

Both notebook and class can be found at this GitHub repository.

# Start by importing necessary packages
import requests
import pandas as pd
from IPython.display import display
from io import StringIO

from dstapi import DstApi # The helper class

DST’s API has four separate function calls to programmatically navigate around the published tables. This guide assumes that the analysist has scouted Statistikbanken already, and has identified the one or two tables from which data should be extracted. For these purposes, we only need two function calls: tableinfo and data.

data, as the name implies, is the function that allows us to get data from a specific table
tableinfo provides all the necessary metadata of a specific table. This latter function is crucial and extremely useful, as it allows to know how a table is structured, and which filters to pass to a data call.

The standard process is to begin by obtaining the necessary information from tableinfo, and then construct the call to pass to data.

This guide will proceed by for each step of the process first showing how to do it by directly using requests (and pandas), and second showing how the utility class DstApi can facilitate the process.

Step 1: Understand what a table has to offer and how it is structured

Our primary example will be DST’s table METROX1, which reports an index measuring the weekly amount of passengers travelling by metro in Copenhagen. This index was developed to measure the population’s response to the COVID pandemic. The table is small and simple, allowing for quick experimentation.

As we know the table’s name/id we can start by accessing the API directly through the python package requests, and ask about the table’s metadata (tableinfo). .

An API call is composed by a main web address, a function call, and a set of parameters. The main web address is https://api.statbank.dk/v1. The function call in this case is tableinfo. The set of necessary parameters, per documentation, is the id of the table and the format in which we’d like to receive the information. We’ll pick "metrox1" for the first (note that the table-id parameter is case-sensitive), and "JSON" for the second.

The API at DST can be called through both requests.get() and requests.post(). DST’s documentation recommends using post, because as the number and complexity of parameters grows (with some of them containing non-standard Danish characters) it’s harder to embed them in an URL. However, as the call to tableinfo is simple, below I provide examples of using both methods.

Note that the .json() method of the request.Response object serves to return the response content (which we requested in JSON format) rather than the object itself. That’s just to print out the output in the notebook.

This function returns a wealth of information. Not just the table id and description, but also the contact of the statistics responsible, and, crucially, names and values of the variables defining the table. In this case SÆSON and Tid.

# Directly embed parameters in the URL with response.get()
requests.get('https://api.statbank.dk/v1' + '/tableinfo' + "?id=metrox1&format=JSON").json()

# Pass a dictionary of parameters to requests.get()
params = {'id': 'metrox1', 'format': 'JSON'}
requests.get('https://api.statbank.dk/v1' + '/tableinfo', params=params).json()

# Use response.post() - note the change in the name of the parameter about the table's name
# I'm also adding here a language parameter - most tables are available in both Danish and English
params = {'table': 'metrox1', 'format': 'JSON', 'lang':'en'}
requests.post('https://api.statbank.dk/v1' + '/tableinfo', json=params).json()

{'id': 'METROX1',
 'text': 'Workday passenger index in the Copenhagen Metro (experimental statistics)',
 'description': 'Workday passenger index in the Copenhagen Metro (experimental statistics) by seasonal adjustment and time',
 'unit': 'Index',
 'suppressedDataValue': '0',
 'updated': '2021-12-23T08:00:00',
 'active': True,
 'contacts': [{'name': 'Peter Ottosen',
   'phone': '+4539173025',
   'mail': 'pot@dst.dk'}],
 'documentation': None,
 'footnote': {'text': 'Data are indexed against an average of week 8 og 9 , 2020. During week 4 and 5, 2020 the City Line, M3 was closed.',
  'mandatory': False},
 'variables': [{'id': 'SÆSON',
   'text': 'seasonal adjustment',
   'elimination': False,
   'time': False,
   'values': [{'id': '10', 'text': 'Seasonally adjusted'},
    {'id': '11', 'text': 'Non-seasonally adjusted'}]},
  {'id': 'Tid',
   'text': 'time',
   'elimination': False,
   'time': True,
   'values': [{'id': '2020U01', 'text': '2020U01'},
    {'id': '2020U08', 'text': '2020U08'},
    {'id': '2020U09', 'text': '2020U09'},
    {'id': '2020U10', 'text': '2020U10'},
    {'id': '2020U11', 'text': '2020U11'},
    {'id': '2020U12', 'text': '2020U12'},
    {'id': '2020U13', 'text': '2020U13'},
    {'id': '2020U14', 'text': '2020U14'},
    {'id': '2020U15', 'text': '2020U15'},
    {'id': '2020U16', 'text': '2020U16'},
    {'id': '2020U17', 'text': '2020U17'},
    {'id': '2020U18', 'text': '2020U18'},
    {'id': '2020U19', 'text': '2020U19'},
    {'id': '2020U20', 'text': '2020U20'},
    {'id': '2020U21', 'text': '2020U21'},
    {'id': '2020U22', 'text': '2020U22'},
    {'id': '2020U23', 'text': '2020U23'},
    {'id': '2020U24', 'text': '2020U24'},
    {'id': '2020U25', 'text': '2020U25'},
    {'id': '2020U26', 'text': '2020U26'},
    {'id': '2020U27', 'text': '2020U27'},
    {'id': '2020U28', 'text': '2020U28'},
    {'id': '2020U29', 'text': '2020U29'},
    {'id': '2020U30', 'text': '2020U30'},
    {'id': '2020U31', 'text': '2020U31'},
    {'id': '2020U32', 'text': '2020U32'},
    {'id': '2020U33', 'text': '2020U33'},
    {'id': '2020U34', 'text': '2020U34'},
    {'id': '2020U35', 'text': '2020U35'},
    {'id': '2020U36', 'text': '2020U36'},
    {'id': '2020U37', 'text': '2020U37'},
    {'id': '2020U38', 'text': '2020U38'},
    {'id': '2020U39', 'text': '2020U39'},
    {'id': '2020U40', 'text': '2020U40'},
    {'id': '2020U41', 'text': '2020U41'},
    {'id': '2020U42', 'text': '2020U42'},
    {'id': '2020U43', 'text': '2020U43'},
    {'id': '2020U44', 'text': '2020U44'},
    {'id': '2020U45', 'text': '2020U45'},
    {'id': '2020U46', 'text': '2020U46'},
    {'id': '2020U47', 'text': '2020U47'},
    {'id': '2020U48', 'text': '2020U48'},
    {'id': '2020U49', 'text': '2020U49'},
    {'id': '2020U50', 'text': '2020U50'},
    {'id': '2020U51', 'text': '2020U51'},
    {'id': '2020U52', 'text': '2020U52'},
    {'id': '2020U53', 'text': '2020U53'},
    {'id': '2021U01', 'text': '2021U01'},
    {'id': '2021U02', 'text': '2021U02'},
    {'id': '2021U03', 'text': '2021U03'},
    {'id': '2021U04', 'text': '2021U04'},
    {'id': '2021U05', 'text': '2021U05'},
    {'id': '2021U06', 'text': '2021U06'},
    {'id': '2021U07', 'text': '2021U07'},
    {'id': '2021U08', 'text': '2021U08'},
    {'id': '2021U09', 'text': '2021U09'},
    {'id': '2021U10', 'text': '2021U10'},
    {'id': '2021U11', 'text': '2021U11'},
    {'id': '2021U12', 'text': '2021U12'},
    {'id': '2021U13', 'text': '2021U13'},
    {'id': '2021U14', 'text': '2021U14'},
    {'id': '2021U15', 'text': '2021U15'},
    {'id': '2021U16', 'text': '2021U16'},
    {'id': '2021U17', 'text': '2021U17'},
    {'id': '2021U18', 'text': '2021U18'},
    {'id': '2021U19', 'text': '2021U19'},
    {'id': '2021U20', 'text': '2021U20'},
    {'id': '2021U21', 'text': '2021U21'},
    {'id': '2021U22', 'text': '2021U22'},
    {'id': '2021U23', 'text': '2021U23'},
    {'id': '2021U24', 'text': '2021U24'},
    {'id': '2021U25', 'text': '2021U25'},
    {'id': '2021U26', 'text': '2021U26'},
    {'id': '2021U27', 'text': '2021U27'},
    {'id': '2021U28', 'text': '2021U28'},
    {'id': '2021U29', 'text': '2021U29'},
    {'id': '2021U30', 'text': '2021U30'},
    {'id': '2021U31', 'text': '2021U31'},
    {'id': '2021U32', 'text': '2021U32'},
    {'id': '2021U33', 'text': '2021U33'},
    {'id': '2021U34', 'text': '2021U34'},
    {'id': '2021U35', 'text': '2021U35'},
    {'id': '2021U36', 'text': '2021U36'},
    {'id': '2021U37', 'text': '2021U37'},
    {'id': '2021U38', 'text': '2021U38'},
    {'id': '2021U39', 'text': '2021U39'},
    {'id': '2021U40', 'text': '2021U40'},
    {'id': '2021U41', 'text': '2021U41'},
    {'id': '2021U42', 'text': '2021U42'},
    {'id': '2021U43', 'text': '2021U43'},
    {'id': '2021U44', 'text': '2021U44'},
    {'id': '2021U45', 'text': '2021U45'},
    {'id': '2021U46', 'text': '2021U46'},
    {'id': '2021U47', 'text': '2021U47'},
    {'id': '2021U48', 'text': '2021U48'},
    {'id': '2021U49', 'text': '2021U49'}]}]}

This wealth of information is already fantastic. In that metadata there’s pretty much anything you need to figure out if you can actually use the table, and eventually how you want to select the data (seasonally ajusted? For 2020 only?). Yet that JSON file might be tough to digest, especially for more complex tables. Those cases might require preprocessing and a different type of visualization. That’s where the DstApi helper class proves useful to me. It has two methods for examining metadata.

The first one, tablesummary, summarizes the main metadata information:

The id and description of the table
The last update time
A table with the main available cuts of the data. Each row corresponds to a variable against which we can select, with examples of variable values and labels

# Initialize the class with the target table
metro = DstApi('METROX1')

# Get the table summary
metro.tablesummary(language='en')

Table METROX1: Workday passenger index in the Copenhagen Metro (experimental statistics) by seasonal adjustment and time
Last update: 2021-12-23T08:00:00

	variable name	# values	First value	First value label	Last value	Last value label	Time variable
0	SÆSON	2	10	Seasonally adjusted	11	Non-seasonally adjusted	False
1	Tid	96	2020U01	2020U01	2021U49	2021U49	True

The second method variable_levels zooms into a specific variable and returns a dataframe for each potential variable value. For example, we could check each value of SÆSON

metro.variable_levels('SÆSON', language='en')

	id	text
0	10	Seasonally adjusted
1	11	Non-seasonally adjusted

Now, we already knew these values for this simple table from tablesummary(), as they are only two. But for more complex tables, this method is very handy. Take for example DNKAPK (hey, by the way, do you know this data comes from Nationalbanken?).

If I wanted for example to extract only data about international direct investments it would be hard to know I should be referring to the value 200 in advance.

dnkapk = DstApi('dnkapk')
display(dnkapk.tablesummary(language='en'))
display(dnkapk.variable_levels('INSTRUMENT', language='en'))

Table DNKAPK: Denmarks international investment position compressed by balance, item,  domestic sector, currency, country and time
Last update: 2021-12-23T08:00:00

	variable name	# values	First value	First value label	Last value	Last value label	Time variable
0	BALANC	3	A	Assets	N	Net assets	False
1	INSTRUMENT	9	100	1.0: Total financial items	600	6.0: Reserve assets	False
2	INDSEK	7	1000	1000: All domestic sectors	1400	1400: Households, etc.	False
3	VALUTA	5	Z01	All currencies	USD	US dollar	False
4	LAND	3	Z9	All countries excluding Denmark	I7	Euro area-19	False
5	Tid	67	2005K1	2005Q1	2021K3	2021Q3	True

	id	text
0	100	1.0: Total financial items
1	200	2.0: Direct investments
2	300	3.0: Portfolio investments
3	310	3.1: Shares
4	320	3.2: Investment fund shares
5	330	3.3: Bonds, etc.
6	400	4.0: Financial derivatives (net balance)
7	500	5.0: Other investments (loans, trade credits, ...
8	600	6.0: Reserve assets

Step 2: Get the data you need

The first step is essential for designing this second step. First and foremost because we need that information to design the call to data. Second, to make sure we only get out the data we need. Asking for too much data only to then having to throw half of it out locally is wasteful, and ultimately disrespectful for the resources invested into allowing anyone to fire up an API call (I mean how amazing is that?).

As for the first step, we’ll start by doing it manually. Here I’ll rely exclusively on request.post() as recommended by DST.

To select the query parameters to pass to the data function appropriatedly one ought to have a careful look at the DATA section in the documentation. Nonetheless, hopefully the examples below will serve to clarify how to construct such calls.

The first two key parameters are, as before, the table name and the format in which we’d like to obtain the data. In the examples below I choose BULK, which has the advantage of being faster and allowing an unlimited number of data rows at export. There are some limitations with this format, such as the inability to perform simple computations (e.g. sums) on the fly. If you need these utilities, you probably don’t need this guide, so I’ll stick with BULK here.

The third crucial parameter is the selection based on the variables shown in e.g. DstApi.tablesummary(). These are mandatory: we need to specify the selection we want to do. We might however choose to include a range of possible values, or all of them, in a selection. In this case, the character * acts as a joker. So to select all values of a variable, we can use *. To select all 2020 weeks in Tid, we could use 2020*.

Below I write the parameters necessary to download the seasonally adjusted (code 10) index for all weeks in the data, and pass them to requests.post(). Finally I print the first 200 characters of the data we received back (in ;-separated format).

params = {
    'table': 'metrox1',
    'format': 'BULK',
    'variables': [
        {'code': 'SÆSON', 'values': ['10']},
        {'code': 'Tid', 'values': ['*']}
    ]
}
r = requests.post('https://api.statbank.dk/v1' + '/data', json=params)
print(r.text[:200])

SÆSON;TID;INDHOLD
Sæsonkorrigeret;2020U01;37,7
Sæsonkorrigeret;2020U08;105,0
Sæsonkorrigeret;2020U09;95,2
Sæsonkorrigeret;2020U10;93,0
Sæsonkorrigeret;2020U11;63,0
Sæsonkorrigeret;2020U12;17,9

Neat! We can then save this data to a csv file or whatever, or directly import it into pandas:

pd.read_table(StringIO(r.text), sep=';').head()

	SÆSON	TID	INDHOLD
0	Sæsonkorrigeret	2020U01	37,7
1	Sæsonkorrigeret	2020U08	105,0
2	Sæsonkorrigeret	2020U09	95,2
3	Sæsonkorrigeret	2020U10	93,0
4	Sæsonkorrigeret	2020U11	63,0

From here on, it should be easy. Keep in mind that you can also specify intervals for time variables, as in the example below, where I also require the data to be exported in English.

params = {
    'table': 'metrox1',
    'format': 'BULK',
    'lang': 'en',
    'variables': [
        {'code': 'SÆSON', 'values': ['11']},
        {'code': 'Tid', 'values': ['>2020U45<=2020U52']}
    ]
}
df = pd.read_csv(
    StringIO(
        requests.post('https://api.statbank.dk/v1' + '/data', json=params).text
    ), sep=';'
)

The code above is already quite compact, but to avoid remembering how to import the data into pandas all the time, DstApi has a method to import the data directly into pandas given a parameter dictionary. So, for example, given the params dictionary defined above, we might call directly

metro.get_data(params=params)

	SÆSON	TID	INDHOLD
0	Non-seasonally adjusted	2020U46	56.2
1	Non-seasonally adjusted	2020U47	55.5
2	Non-seasonally adjusted	2020U48	58.3
3	Non-seasonally adjusted	2020U49	57.6
4	Non-seasonally adjusted	2020U50	46.9
5	Non-seasonally adjusted	2020U51	40.6
6	Non-seasonally adjusted	2020U52	33.2

the .get_data() method has also the built-in option of downloading an entire data table by not passing any parameter dictionary. As mentioned above, this might be (very) wasteful. Some DST tables contain billions of data points. That’s why when used in this way the method asks for explicit confirmation before proceeding:

metro.get_data(language='en')

/tmp/ipykernel_706/4141372998.py:1: UserWarning: API call parameters are not specified. Parameters resulting in the download of the entire table will be automatically generated. This can result in massive data downloads.
  metro.get_data(language='en')


Continue (Y/Yes)? y

	SÆSON	TID	INDHOLD
0	Seasonally adjusted	2020U01	37.7
1	Non-seasonally adjusted	2020U01	0.0
2	Seasonally adjusted	2020U08	105.0
3	Non-seasonally adjusted	2020U08	102.9
4	Seasonally adjusted	2020U09	95.2
...	...	...	...
187	Non-seasonally adjusted	2021U47	92.8
188	Seasonally adjusted	2021U48	96.4
189	Non-seasonally adjusted	2021U48	100.0
190	Seasonally adjusted	2021U49	90.2
191	Non-seasonally adjusted	2021U49	95.3

192 rows × 3 columns

The magic here happens because DstApi has a helper method that automatically constructs a dictionary of parameters for exporting an entire table, DstApi._define_base_params().

In practice, I often end up using this function to help me define a basic dictionary of parameters, and then adjust it according to my needs, as in the example below.

# Start by constructing a basic dictionary
params = metro._define_base_params(language = 'en')
params

{'table': 'metrox1',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'SÆSON', 'values': ['*']},
  {'code': 'Tid', 'values': ['*']}]}

# Adjust the parameters of variables according to my needs
params['variables'][0]['values'] = ['10']
params['variables'][1]['values'] = ['>2020U45<=2020U52']
params

{'table': 'metrox1',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'SÆSON', 'values': ['10']},
  {'code': 'Tid', 'values': ['>2020U45<=2020U52']}]}

# Get the data
metro.get_data(params=params)

	SÆSON	TID	INDHOLD
0	Seasonally adjusted	2020U46	59.0
1	Seasonally adjusted	2020U47	54.0
2	Seasonally adjusted	2020U48	56.2
3	Seasonally adjusted	2020U49	54.5
4	Seasonally adjusted	2020U50	44.3
5	Seasonally adjusted	2020U51	40.2
6	Seasonally adjusted	2020U52	44.8

And that’s it!

I hope this guide was useful, and that the DstApi class can prove as helpful to you as it is for me.

Once again, let me conclude with a shout out to Denmark Statistics, a real national treasure. Thanks for all your work in gathering, organizing, and publishing data for everyone to use. It’s a fantastic service, and one for which you’ll never be thanked enough.

Alessandro Martinello

Using DST's API with python

Share

Step 1: Understand what a table has to offer and how it is structured

Step 2: Get the data you need

And that’s it!