SDG DATA NAVIGATOR

Rationale

The 2030 Agenda for Sustainable Development charts a path to a better world over the next decade. 
The achievement of these Sustainable Development Goals relies heavily on the availability and use of relevant and timely data to understand the gaps, target solutions, and measure progress. Many traditional data sources and methods tend to have strong limitations with respect to data production and data use, with coverage gaps across space, time, and people.  
In addition, running “classical” statistical operations may be costly particularly when censuses and national surveys have to be carried out. 
To effectively and efficiently address the breadth and depth of the Sustainable Development Goals (SDGs), governments are increasingly adopting alternative or non-traditional data sources and methods to fill data gaps and complement traditional data.  
The key alternative data sources and approaches to be highlighted are geospatial data, citizen-generated data, administrative data, big data and open data. 

Administrative data have been on the radar of statisticians already for a long time and they are used extensively in some statistical sectors (health, education and population statistics in particular).  
Administrative data refer to the routine data collected by governments and service providers in the course of their day-to-day business. They have emerged as a key alternate data source for implementing and monitoring the SDGs. Administrative data includes registers such as population registers, business registers and real-estate registers, service delivery information, tax records, crime reports, and much more. 

The fact that they are free of charge doesn’t mean that their production is necessarily cheap, nor easy. They are in general the result of a formalised institutional setting (inter-administrative memoranda, agreements, etc.) that doesn’t necessarily exist or work properly in all countries .In many countries, there are often questions raised about the quality and reliability of the transfers of information from one level to the other within the sector administrations as well as about the capacities of the staff who are in charge of their collection and compilation. These questions may limit the use that can be made of them. In addition, the exchange of data with the national statistics office is the result. Yet they constitute a large part of the sources for statistics and they are central to the production of several SDG indicators. 

Citizen-generated data: Citizen-generated data (CGD) are defined as data that people or their organisations produce to directly monitor, demand or drive change on issues that affect them. It is actively given by citizens, providing direct representations of their perspectives and an alternative to datasets collected by governments or international institutions.  
CGD includes a wide range of methods from actively collected survey data to passively produced data 
through air pollution sensors, including overlaps with other data categories, such as geospatial data via mapping and big data via sensors and tracking devices. CGD also includes social accountability approaches such as citizen surveys and community scorecards. Given the nature of citizen engagement, citizen empowerment is an important byproduct of using CGD, which is particularly relevant for stakeholders working towards the Leave No One Behind (LNOB) principle through data. 
CGD may also help gathering data at a very low level and at levels not reachable by official statistics. They can also help going beyond the averages and identify hidden issues. Last but not least, they may facilitate a participative dialogue and set benchmarks for further exploration. They carry a dynamic dimension that allow more nuanced diagnostics and assessments. 

Geospatial data broadly refers to any data that has a geographic or location component to it and “combines it with attribute information (characteristics of the object, event or phenomena concerned), and often also temporal information (the time or life span at which the 
location and attributes exist). There are many ways to collect, synthesize, and analyze such data and therefore, many ways to describe categories of geospatial data including by data source, type, and products 
Geospatial data (in particular Earth Observations) can be used to identify features of interest, such as agricultural land, forests, urban areas, roads and water based on how they appear in the (satellite) images. But they can also help monitoring air and water quality, support assessing and monitoring the potential for solar, wind, hydropower, and biofuel development, provide early warnings of vector-borne diseases and natural disasters, map and monitor urban settlements and housing, provide information on crop health and fields, market access, pests and diseases and map potentially dangerous infrastructure (such as waste management facilities and nuclear facilities). All this information may complement and cross-check data gathered through other sources.  
www.worldbank.org

Big data is the rapid expansion of structured, unstructured, and semi-structured data generated mostly from internet-connected devices, in particular mobile phones. The volume (large volume of data available now and in the future), velocity (speed of expansion of the big data universe), and variety (diverse format, size and storage/transfer structures) of big data is what makes it so „big”.  
Big data covers different sources such as scanner data, web-scraping (job vacancies, enterprise characteristics,), mobile network data (broader than mobile phone), road sensors, satellite positioning data, credit and debit card payments, online payments and data from social media (Google searches, tweets, Facebook posts, blogs – all involving text analytics). 

There is not yet regular production of official statistics based on big data and experiences are all still in an early phase. The UN Global Working Group on Big Data for Official Statistics is providing several guidance documents The UN Big Data Project Inventory gives an overview on big data initiatives from national and international Organisations.  

Big data obviously provide a huge potential for closing information gaps but also reveal a large need for framework conditions to allow a proper use. 

Open data are data that can be freely used, reused and redistributed by anyone for any purpose. A distinction must be done between the data that are technically opened, the data that are legally opened and the data that are available under a format that is readable and allows using and re-using them. The key criteria include:  

  • Availability: the data must be available as a whole at no more than a reasonable reproduction cost. The data must also be available in a convenient and modifiable form including machine-readable and open formats.  
  • Reuse and redistribution: the data provider must permit reuse and redistribution including linking with other datasets. 
  • Equal access: everyone must be able to use, reuse and redistribute the data. There should be no restrictions, for instance to prevent commercial use. 

They are conducive to strengthen transparency, accountability and responsiveness in particular from government and to spur social and business innovation. They may help filling the information gaps particularly at the local level (local government, small businesses, local associations and NGOs) but this may require investments for increasing comparability and harmonisation “in sectors dominated by a large number of relatively small units using an assortment of concepts, definitions, reporting periods, and accounting rules”. 

For all the sources mentioned above, a particular attention will have to be given to data security, confidentiality and quality. 

Use of alternative data sources for the SDGs

Content: Comprehensive data mapping covering alternative data sources, identification of the issues linked to the acquisition and use of these data and setting up of collaborations with their owners, discussion on issues such as quality, data security and confidentiality. 

Result: All the existing data sources in the country have been identified and explored in a comprehensive data mapping and the ones that may contribute to a more regular and sustainable production of the SDGs indicators have been mobilised and used. 

Objective/ Outcome

All the existing data sources, and in particular the non-official data sources are mobilised to fill data gaps, reduce costs of data production and to cross check information.   

Contents / Outputs

Alternative data sources are established/ integrated into the production of statistics 

Rules and guidelines for the use of alternative data are agreed upon. 

Awareness/ Knowledge around the potentials/ challenges of alternative data sources is increased 

Possible Activities & Good Practices

Mapping of existing and potential new data sources

Training/ Conferences/ Workshops around the use of alternative data 

Enhancement of the use of Administrative data:

    • Agreements, memoranda between statistical agencies and the producers of administrative data, 
    • Rules for the exchange of data (content, media, calendar …),
    • Technical support from the NSO (quality management, training …)

Enhancement of the use of Citizen generated data

    • Establishment of collaborations with local communities
    • Cross-checking of statistics and debate on results
    • local debates on data and results

Enhancement of the use of Geospatial data

    • Agreements regarding the acquisition of the data from specialised agencies in satellite imagery 
    • Development of Multi-disciplinary approaches and coordination 

Enhancement of the use of Big data

    • Identification of the data sources that exist at present in the country
    • Discussion on what information could be drawn from big data for the production of data for the SDG indicators 
    • Identification of the potential blocks and hazards in using big data
    • IT and other technical implications (data storage, archiving, mining …) and capacity requirements (data scientists) 

Links with other elements of the process landscape

Links with the Steering processes: As for the use of other sources, there is a need for a clear/transparent legal and institutional framework for organising collaboration and exchanges. Using these alternative sources will also require new tools and capacities, thus financing. Quality will be a central issue. 

Links with the other Core processes: The mix of data sources is certainly a plus for the relevance of the data produced. CGD carry a potential for a more relevant debate based on nuanced and in depth analyses. 

Links with the Supporting processes: Capacity building will be necessary as well as sharing experiences regionally and internationally. 

Quality standards and references

National actors involved

All the actors of the enlarged Data Ecosystem, who are producing and/or using information and data for regular SDG monitoring and reporting:

Data Providers from the public sector (Space agencies (NASA, ESA), postal service, telecommunications providers, armed forces, police force (criminal statistics, remote sensing weather services) 

Data Providers from the Private Sector: Social media (Facebook, Twitter), telecommunication providers, Digital Globe Gallup (polling institution), Insurers  rail companies/airlines, logistics companies 
Academia and research institutions