The 2030 Agenda for Sustainable Development charts a path to a better world over the next decade.
The achievement of these Sustainable Development Goals relies heavily on the availability and use of relevant and timely data to understand the gaps, target solutions, and measure progress. Many traditional data sources and methods tend to have strong limitations with respect to data production and data use, with coverage gaps across space, time, and people.
In addition, running “classical” statistical operations may be costly particularly when censuses and national surveys have to be carried out.
To effectively and efficiently address the breadth and depth of the Sustainable Development Goals (SDGs), governments are increasingly adopting alternative or non-traditional data sources and methods to fill data gaps and complement traditional data.
The key alternative data sources and approaches to be highlighted are geospatial data, citizen-generated data, administrative data, big data and open data.
Administrative data have been on the radar of statisticians already for a long time and they are used extensively in some statistical sectors (health, education and population statistics in particular).
Administrative data refer to the routine data collected by governments and service providers in the course of their day-to-day business. They have emerged as a key alternate data source for implementing and monitoring the SDGs. Administrative data includes registers such as population registers, business registers and real-estate registers, service delivery information, tax records, crime reports, and much more.
The fact that they are free of charge doesn’t mean that their production is necessarily cheap, nor easy. They are in general the result of a formalised institutional setting (inter-administrative memoranda, agreements, etc.) that doesn’t necessarily exist or work properly in all countries .In many countries, there are often questions raised about the quality and reliability of the transfers of information from one level to the other within the sector administrations as well as about the capacities of the staff who are in charge of their collection and compilation. These questions may limit the use that can be made of them. In addition, the exchange of data with the national statistics office is the result. Yet they constitute a large part of the sources for statistics and they are central to the production of several SDG indicators.
Citizen-generated data: Citizen-generated data (CGD) are defined as data that people or their organisations produce to directly monitor, demand or drive change on issues that affect them. It is actively given by citizens, providing direct representations of their perspectives and an alternative to datasets collected by governments or international institutions.
CGD includes a wide range of methods from actively collected survey data to passively produced data
through air pollution sensors, including overlaps with other data categories, such as geospatial data via mapping and big data via sensors and tracking devices. CGD also includes social accountability approaches such as citizen surveys and community scorecards. Given the nature of citizen engagement, citizen empowerment is an important byproduct of using CGD, which is particularly relevant for stakeholders working towards the Leave No One Behind (LNOB) principle through data.
CGD may also help gathering data at a very low level and at levels not reachable by official statistics. They can also help going beyond the averages and identify hidden issues. Last but not least, they may facilitate a participative dialogue and set benchmarks for further exploration. They carry a dynamic dimension that allow more nuanced diagnostics and assessments.
Geospatial data broadly refers to any data that has a geographic or location component to it and “combines it with attribute information (characteristics of the object, event or phenomena concerned), and often also temporal information (the time or life span at which the
location and attributes exist). There are many ways to collect, synthesize, and analyze such data and therefore, many ways to describe categories of geospatial data including by data source, type, and products
Geospatial data (in particular Earth Observations) can be used to identify features of interest, such as agricultural land, forests, urban areas, roads and water based on how they appear in the (satellite) images. But they can also help monitoring air and water quality, support assessing and monitoring the potential for solar, wind, hydropower, and biofuel development, provide early warnings of vector-borne diseases and natural disasters, map and monitor urban settlements and housing, provide information on crop health and fields, market access, pests and diseases and map potentially dangerous infrastructure (such as waste management facilities and nuclear facilities). All this information may complement and cross-check data gathered through other sources.
Big data is the rapid expansion of structured, unstructured, and semi-structured data generated mostly from internet-connected devices, in particular mobile phones. The volume (large volume of data available now and in the future), velocity (speed of expansion of the big data universe), and variety (diverse format, size and storage/transfer structures) of big data is what makes it so „big”.
Big data covers different sources such as scanner data, web-scraping (job vacancies, enterprise characteristics,), mobile network data (broader than mobile phone), road sensors, satellite positioning data, credit and debit card payments, online payments and data from social media (Google searches, tweets, Facebook posts, blogs – all involving text analytics).
There is not yet regular production of official statistics based on big data and experiences are all still in an early phase. The UN Global Working Group on Big Data for Official Statistics is providing several guidance documents The UN Big Data Project Inventory gives an overview on big data initiatives from national and international Organisations.
Big data obviously provide a huge potential for closing information gaps but also reveal a large need for framework conditions to allow a proper use.
Open data are data that can be freely used, reused and redistributed by anyone for any purpose. A distinction must be done between the data that are technically opened, the data that are legally opened and the data that are available under a format that is readable and allows using and re-using them. The key criteria include:
They are conducive to strengthen transparency, accountability and responsiveness in particular from government and to spur social and business innovation. They may help filling the information gaps particularly at the local level (local government, small businesses, local associations and NGOs) but this may require investments for increasing comparability and harmonisation “in sectors dominated by a large number of relatively small units using an assortment of concepts, definitions, reporting periods, and accounting rules”.
For all the sources mentioned above, a particular attention will have to be given to data security, confidentiality and quality.
Result: All the existing data sources in the country have been identified and explored in a comprehensive data mapping and the ones that may contribute to a more regular and sustainable production of the SDGs indicators have been mobilised and used.
Rules and guidelines for the use of alternative data are agreed upon.
Awareness/ Knowledge around the potentials/ challenges of alternative data sources is increased
Mapping of existing and potential new data sources
Training/ Conferences/ Workshops around the use of alternative data
Enhancement of the use of Administrative data:
Enhancement of the use of Citizen generated data
Enhancement of the use of Geospatial data
Enhancement of the use of Big data
Links with the Steering processes: As for the use of other sources, there is a need for a clear/transparent legal and institutional framework for organising collaboration and exchanges. Using these alternative sources will also require new tools and capacities, thus financing. Quality will be a central issue.
Links with the other Core processes: The mix of data sources is certainly a plus for the relevance of the data produced. CGD carry a potential for a more relevant debate based on nuanced and in depth analyses.
Links with the Supporting processes: Capacity building will be necessary as well as sharing experiences regionally and internationally.
All the actors of the enlarged Data Ecosystem, who are producing and/or using information and data for regular SDG monitoring and reporting:
Data Providers from the public sector (Space agencies (NASA, ESA), postal service, telecommunications providers, armed forces, police force (criminal statistics, remote sensing weather services)
Data Providers from the Private Sector: Social media (Facebook, Twitter), telecommunication providers, Digital Globe Gallup (polling institution), Insurers rail companies/airlines, logistics companies
Academia and research institutions