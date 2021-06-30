In March, Internews facilitated a ‘Data Scraping’ e-Workshop for the Open Development Cambodia team. The aim of the workshop was to strengthen ODC’s technical capability to collect ISAC-relevant data from government websites, for public sharing on the developing Social Accountability web page.

The Workshop was conducted by web development firm Thibi, drawing on its prior work collecting data in the Mekong region. Participants from ODC included Data Research and GIS Specialist Loch Kalyan, Senior Data Research and GIS Officer Vong Pisith, Senior Web Developer Sam An Mardy, Program and Partnership Officer Ourn Vimoil.

The Workshop focused particularly on the challenges of getting machine-readable information from web pages that are ‘dynamic’. Prior to the course, Thibi worked to collect public budgeting data and used a government database portal and other Cambodian sites as examples during the workshop.

The Workshop was delivered on March 29th and 30th, in two three-hour afternoon sessions delivered by Yan Naung Oak of Thibi via Zoom. The first session reviewed fundamentals of static/dynamic web pages, and the varied ways (XML / CSV / JSON formats) that data is encoded and embedded for use in them, for communication between web servers and web browsers. The participants tested techniques for scraping data using Google Sheets and Workbench.

The second session focused in on more advanced methods of collecting data from the web: a number of browser plug-ins and apps (industry-standard tools including Selector Gadget, Webscraper, HTTPTrack, Sitesucker). The participants continued practical explorations focusing directly on Cambodia’s NCDD web site, and concluded with some recommendations for ODC’s continued engagement with public web sites.

Following the workshop, the ODC team has updated its existing dataset on “Report of actual revenue and expenditure of municipal/district /khan administration in the Kingdom of Cambodia (2015-2020)” and will look for commune budget data to share on the new Social Accountability page. In addition to follow-up on use of these new tools, the possibility of more advanced tools, (such as custom-designed scripts) will be further explored by the ODC team.

Data representation [CSV]

