Web Scraping for Price Statistics in the Philippines
Author: Manuel Leonard F. Albis, Sabrina O. Romasoc, Bea Andrea C. Gavira, Shushimita G. Pelayo & Jazzen Paul J. Asombrado
Abstract:
Official price statistics in the Philippines are mainly sourced from the conduct of regular surveys and censuses which entail high costs. As businesses move into digital platforms, alternatives to these traditional data sources have become more available; one of which is web scraping. Web scraping is the process of collecting information from the web. As digital and online platforms become increasingly utilized for commerce, web scraping offers a way to increase the frequency of data collection while reducing its cost compared to price surveys. This paper aims to compute an online Consumer Price Index (CPI) of the National Capital Region (NCR), specifically for Divisions 1 and 2 of the Philippine Classification of Individual Consumption According to Purpose (PCOICOP), which will be compared to the official CPI of NCR calculated by the PSA. In addition to the official methodology of the CPI, a hybrid approach is introduced in this study for the computation of the online CPI. Finally, this paper presents the results of the official run of the developed web scraping programs and provides recommendations that will be useful for future web scraping projects in the Philippines.
Keywords:web scraping, online prices, CPI, PCOICOP, R, RSelenium, rvest