Web Scraping for Activists and Investigative Journalists

Are you an investigative journalist or activist?

Need to download / scrape lots of data?

I may be able to offer pro bono support.

Areas of interest

I’m particularly interested if you’re investigating:

  • climate crimes
  • fossil fuel companies
  • corruption & anti democracy
  • wealth inequality
  • tax evasion

Relevant experience

Previously I’ve acquired sensitive data for:

  • Global Witness
  • Shelter
  • BBC investigative journalism team
  • UK Government Digital Service
  • Many more commercial clients

I learned web scraping while working for ScraperWiki. We acquired and analysed data for NGOs and private clients alike.

My background

I’m a software engineer and founder of a small tech company.

I’m outraged but optimistic about the future. I want to help people working to expose climate injustice, wrongdoing and corruption.

Ethical Web Scraping

I believe in ethical web scraping, operating within a set of principles:

Principles for scrapers

  • A scraper should try to identify itself as such, preferably with a contact URL and/or email address (where doing so would not compromise an investigation)
  • A scraper should not disrupt the service being scraped (it should use rate limiting and backoff).
  • A scraper should not be built to gain free access to an otherwise paid-for data product.
  • A scraper should not attempt to circumvent the security of a website.
  • A scraper should not facilitate plagiarism or copyright infringement.
  • A scraper should not help spammers or bad guys.
  • A scraper should not use anonymising proxies, Tor or dirty tricks to hide its origin.

Principles for site owners

  • A site that welcomes human visitors should welcome well-behaved robots too.
  • A site should not discriminate against well-behaved robots; one that welcomes Google should welcome others too.

ONS policy

I try follow the Office for National Statistics web scraping policy. This is not always completely possible for public interest investigations. Where not possible, we should document why

If that’s all cool with you, then get in touch.