Web Scraping part2: Digging deeper

Slides from the second web scraping through R session: Web scraping for the humanities and social sciences

Slides from the first session here

...the third session here

... and the fourth and final session here

In which we make sure we are comfortable with functions, before looking at XPath queries to download data from newspaper articles. Examples including BBC news and Guardian comments

Download the .Rpres file to use in Rstudio here

A regular R script with the code only can be accessed here

UPDATE March 2015:
New 2015 version of slides here
PDFs of slides available here


  1. Thank you for sharing, much appreciated.

  2. A solid and good set of articles you posted. Its too bad the urls respond with "Server overloaded, please throttle your requests."

  3. The wiki one in the first set of slides overloads all the time. I was able to follow the class with some patience though. Good stuff, thanks for this.

  4. Very clear n easy instructions.Thanks a ton...now i can scrape lyrics for eminem ;)

  5. Thank you for very clear tutorial. I am having one problem with the
    "Example with dataframe" slide in the unlist function. it skips all NULL
    or blank values which gives error when building the data frame.
    I am still new to R and I can't figure out how to fix.
    Any ideas?

    1. I have found that adding
      if (length(title) == 0) title=NA
      will fix the issue however using
      if (is.null(title)) title <- NA
      as shown in Part 2 will not work in all cases

  6. This comment has been removed by the author.

  7. Gambling gambling online casino you.
    Gclub The online casino game offers you the opportunity to enjoy online casino games like online casino games. Comfortable with a variety of games that are ready to open players all the time. Online games are risky for both mobile phones and PCs. Do not worry, the problem is that online gambling is not available. Players can access the site directly or install the application. Enjoy your favorite game with all the fun and excitement every time you want. Not only this, our online gambling games are designed to meet the needs of current gamblers. Because it can bet gain. Convenient channel Fast withdrawals and work teams that are open to serve you like a pro. Betting online gambling is easy with our online risk website. With the service of players all the time and we also have live broadcast from the casino Poipet. It will give players full enjoyment and reliability. Service is not disappointing, we certainly guarantee click here. Royal1688

  8. Your article is very good. Is an article that provides knowledge for living quite well

  9. Web scraping is one of the important techniques in the contemporary, thanks for sharing such informative content. If you want to know more about Data Science and mining please visit Learnbay.