Let R browse the web for you: An introduction to web-scraping with RSelenium

Nicole Schwitter introduces how R can be used to collect data from web pages.

talks
web-scraping
Author
Published

December 8, 2022

Speaker: Nicole Schwitter, PhD student, Department of Sociology, University of Warwick

Abstract: The rise of the internet and mass digitalisation have led to vast amounts of digital data in recent years. These novel digital sources of data are used to gain new insights into old and new questions of data-driven sciences: From election results to press releases, social media posts or user reviews, research now often makes use of data that is online. Many modern commercial websites await user input and/or display dynamic web content which is generated on the fly via JavaScript technologies. Standard techniques of web-scraping which are well-suited to collect information from static pages will fail in these instances. In such cases, it is necessary to automate the browser to visit websites, click buttons, and fill in forms by itself - a task the tool Selenium fulfils. This talk will give a brief overview of approaches towards web scraping and an introduction to the R package RSelenium. The presentation will highlight use cases of RSelenium, show its potential, and give a starting point to those who have never used it.

Resources

Citation

BibTeX citation:
@online{schwitter2022,
  author = {Schwitter, Nicole},
  title = {Let {R} Browse the Web for You: {An} Introduction to
    Web-Scraping with {RSelenium}},
  date = {2022-12-08},
  url = {https://warwickrug.github.io/posts/2022-12-08-web-scraping-with-rselenium},
  langid = {en}
}
For attribution, please cite this work as:
Schwitter, Nicole. 2022. “Let R Browse the Web for You: An Introduction to Web-Scraping with RSelenium.” December 8, 2022. https://warwickrug.github.io/posts/2022-12-08-web-scraping-with-rselenium.