Web Scraping: Scaling up Digital Data Collection

The latest slides from web scraping through R: Web scraping for the humanities and social sciences

Slides from the first session here

Slides from the second session here


Slides from the fourth and final session here


This week we look in greater detail at scaling up digital data-collection: coercing scraper output into dataframes, how to download files (along with a cursory look at the state of IP law), cover basic text-manipulation in R, and take a first look at working with the APIs (share counts on Facebook).

Download the .Rpres file to use in Rstudio here

A regular R script with code-snippets only can be accessed here

UPDATE March 2015:
New 2015 version of slides here
PDFs of slides available here

8 comments:

  1. This is great! This information becomes obsolete fast so it is quite useful.
    Is there any chance of a downloadable form for your slides? For whatever reason, I don't feel comfortable unless I have PDF files that I can annotate myself!
    Thanks

    ReplyDelete
    Replies
    1. Glad you find it useful! I've added PDF slides. Fingers crossed they work properly - a bit hard to get the formatting right

      Delete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. Web Scraping, in general, means looking a webpage as a table in database and website as a database.

    ReplyDelete
  4. I don't feel comfortable unless I have PDF files that I can annotate myself!

    scrape a website

    ReplyDelete
  5. R is best when it comes to web scraping. I am using R to develop web scraper as per my clients requirements.Thanks for sharing this information with us.

    ReplyDelete
  6. Great post! I have read through your tutorial from part I and they are awesome! Thanks for sharing your knowledge!

    ReplyDelete
  7. Web Scraping Services or website scraping service is like a boon to grow business and reach your business to new heights and success. Website scraping services is nothing but a process of extracting data from website for your business need.

    ReplyDelete