Nicole Schwitter introduces how R can be used to collect data from web pages.
Speaker: Nicole Schwitter, PhD student, Department of Sociology, University of Warwick
Abstract: The rise of the internet and mass digitalisation have led to vast amounts of digital data in recent years. These novel digital sources of data are used to gain new insights into old and new questions of data-driven sciences: From election results to press releases, social media posts or user reviews, research now often makes use of data that is online. Many modern commercial websites await user input and/or display dynamic web content which is generated on the fly via JavaScript technologies. Standard techniques of web-scraping which are well-suited to collect information from static pages will fail in these instances. In such cases, it is necessary to automate the browser to visit websites, click buttons, and fill in forms by itself - a task the tool Selenium fulfils. This talk will give a brief overview of approaches towards web scraping and an introduction to the R package RSelenium. The presentation will highlight use cases of RSelenium, show its potential, and give a starting point to those who have never used it.