scrapy learning

Crawling comics

Recently, I was reading bleach comics from the internet.

bleach

But when I read those comics, I realise the image loading is relatively slow due to the internet connectivity. So I was considering if I can write any scripts to save images locally for quick reading.

For this task, I applied scrapy python spider framework for crawling automation. However, most of comics are using lazy loading technology, which utilizes JavaScript code to load images. For lazy loading, the images can not be loaded only from html static context. There are two potential solutions to handle this trouble:

  1. Call python library to run browser to activate javascript code functionality.
  2. Use some javascript libraries to load page and then run crawling.