Web Scraping part2: Digging deeper

Slides from the second web scraping through R session: Web scraping for the humanities and social sciences

Slides from the first session here

...the third session here

... and the fourth and final session here

In which we make sure we are comfortable with functions, before looking at XPath queries to download data from newspaper articles. Examples including BBC news and Guardian comments

Download the .Rpres file to use in Rstudio here

A regular R script with the code only can be accessed here

UPDATE March 2015:
New 2015 version of slides here
PDFs of slides available here


  1. Thank you for sharing, much appreciated.

  2. A solid and good set of articles you posted. Its too bad the urls respond with "Server overloaded, please throttle your requests."

  3. The wiki one in the first set of slides overloads all the time. I was able to follow the class with some patience though. Good stuff, thanks for this.

  4. Very clear n easy instructions.Thanks a ton...now i can scrape lyrics for eminem ;)

  5. Thank you for very clear tutorial. I am having one problem with the
    "Example with dataframe" slide in the unlist function. it skips all NULL
    or blank values which gives error when building the data frame.
    I am still new to R and I can't figure out how to fix.
    Any ideas?

    1. I have found that adding
      if (length(title) == 0) title=NA
      will fix the issue however using
      if (is.null(title)) title <- NA
      as shown in Part 2 will not work in all cases

  6. This comment has been removed by the author.