In data science, we're taught to seek clean, structured datasets—the digital gold of the information age. We dream of seamless APIs, tidy CSVs, and well-documented schemas. But for the data professional working in Kenya and much of East Africa, this ideal is a myth. The reality is a paradox: a vast, data-rich environment that is frustratingly inaccessible.
This is an exposé, a ground-level look at what it's truly like to work with public data here. It's for the analyst who has spent hours trying to make sense of extracting table data from PDF reports and the professional who feels like they're working in a vacuum. Let’s talk about the terrain.
The Myth of Openness vs. The Reality of Data Traps
On paper, Kenya is a leader in data openness. The Kenya National Bureau of Statistics (KNBS), established by the Statistics Act of 2006, is the principal source of official data, and constitutional rights to information exist. But the day-to-day work paints a different picture.
The problem isn't a lack of data; it's a data accessibility issue. Much of the country's most valuable information is trapped in non-machine-readable formats. Think of it as a library where all the books are there, but they’re sealed in glass boxes. This is a common operational hurdle. For example, a data professional trying to analyze the latest Quarterly Labour Force Report from KNBS will find it's often a PDF document.
While the report contains crucial tables on unemployment rates, labor force participation, and youth not in education or training, the data within these tables cannot be programmatically queried. It requires manual transcription, or more technically, a dedicated web scraping and OCR (Optical Character Recognition) process to convert it into a usable format like a spreadsheet. This seemingly simple step consumes an inordinate amount of time and resources, introduces a high risk of transcription errors, and fundamentally slows down the pace of analysis. The data is technically "public," but in practice, it's a heavy-duty puzzle to solve before you can even begin your analysis.
Navigating a Decentralized Labyrinth
Beyond the PDF problem, the Kenyan data landscape is defined by its decentralized and fragmented nature. There is no single, comprehensive portal where a data professional can find everything they need. Instead, you're forced to act as a digital archaeologist, digging through:
- Ministry and county-level websites: These sites often host valuable reports and raw data, but are inconsistently updated, poorly organized, and may even be offline for long periods.
- NGO and CSO reports: Many civil society organizations publish rich, impact-driven data, but it is scattered across different project pages and reports.
- International development partner portals: Organizations like the World Bank and the UN maintain their own open data portals, which are valuable but may not always contain the specific, granular data you need for a local context.
This siloed structure means that a simple request for data can turn into a multi-day scavenger hunt. It forces the data professional to become a master of data discovery and integration, stitching together a coherent dataset from disparate, often poorly-documented sources. The lack of standardized metadata and data dictionaries across these sources only compounds the challenge.
The Human and Institutional Hurdles
The technical issues are only half the story. The other half is shaped by institutional and cultural factors. There's a persistent lack of data literacy among many potential data consumers, from mid-level program managers to senior-level decision-makers. This means that a data professional's job often involves a significant amount of advocacy and capacity building just to demonstrate the value of their work.
Furthermore, some institutions treat data as a source of power rather than a public good. There can be an underlying fear of transparency and a resistance to sharing information that could reveal inefficiencies or poor performance. This institutional inertia is a major bottleneck that technical skills alone cannot solve. It requires a different kind of expertise—one rooted in communication, trust-building, and professional influence.
A Call for a Collaborative Solution
So, what do we do about this? We can’t wait for a top-down solution. We must build one from the bottom up, from the ground level of our work.
As data professionals, we need to move beyond individual frustration and embrace a collaborative approach. This is an issue we all face, and together, we can work around it and, in doing so, advocate for change. Here's a challenge from our shelf to yours:
- Share your workarounds: Have you written a Python script to scrape a difficult table from a PDF? Share the script on GitHub and link to the KNBS report. Let’s create a community repository of technical solutions that others can build on.
- Document everything: When you’ve successfully cleaned a dataset, create a data dictionary and a brief methodology note. Share it. This is how we begin to build our own collective, informal data governance framework.
- Use your blog or newsletter: We, as a community of data professionals, must use platforms like this blog to showcase the real work and the technical solutions we're building. Every time we publish an article detailing how we solved a data problem, we are not just sharing knowledge—we are advocating for systemic change by demonstrating what is possible.
The data landscape in Kenya is not a mythic, open playground; it's a rugged terrain full of obstacles. But for the skilled, resilient data professional, it’s also an environment ripe with the opportunity to not only deliver insights but to actively shape the future of data in our country.