wenku8_fetch_se

From NAT, 6 Years ago, written in Bash, viewed 823 times. This paste is a reply to wenku8_fetch from NAT - view diff
URL https://code.nat.moe/view/cefa1fdc Embed
Download Paste or View Raw
  1. #!/bin/bash
  2. # A tool to download all novels on wenku8.com, the second edition. Minor BUGs fixed.
  3. #
  4. # - Use fallback URL to fetch the real address to ensure that all the novels are downloaded.
  5. # - Removed '?' after the file name.
  6.  
  7. STORE_PATH=./save/
  8. FETCH_URL='http://www.wenku8.com/modules/article/articlelist.php?page='
  9. DOWNLOAD_TYPE="utf8"
  10. DOWNLOAD_URL="http://dl.wenku8.com/txt$DOWNLOAD_TYPE/__K/__ID.txt"
  11. DOWNLOAD_FALLBACK="http://dl.wenku8.com/down.php?type=$DOWNLOAD_TYPE&id="
  12. TEMP=temp.tmp
  13. FROM=1
  14. TO=93
  15. TIMESTEMP_FORMAT="%H:%M:%S"
  16.  
  17. for page in $(seq $FORM $TO)
  18. do
  19.         echo "[$(date +$TIMESTEMP_FORMAT)] Starting page $page..."
  20.         curl $FETCH_URL$page 2> /dev/null > $TEMP
  21.         cat $TEMP | iconv -f gbk -t utf-8 | grep 'font-size:13px;' | sed -e 's/.*book\///g; s/.htm">/ /g; s/<\/a><\/b>//g;' > title$TEMP
  22.         _ids=$(cat $TEMP | iconv -f gbk -t utf-8 | grep 'font-size:13px;' | sed -e 's/.*book\///g; s/\.htm.*//g')
  23.         for novel in $_ids
  24.         do
  25.                 echo "[$(date +$TIMESTEMP_FORMAT)] Downloading $(cat title$TEMP|grep $novel)"
  26.                 _this_url="$(echo $DOWNLOAD_URL|sed -e "s/__K/1/; s/__ID/$novel/;")"
  27.                 _this_save="$STORE_PATH/$(cat title$TEMP|grep $novel|tr ' ' '_'|dos2unix 2>/dev/null)"
  28.                 curl $_this_url > $_this_save 2> /dev/null
  29.                 [[ ! -z $(cat $_this_save | grep '404 Not Found') ]] && {
  30.                         _this_url="http://dl.wenku8.com$(curl -I "$DOWNLOAD_FALLBACK$novel" 2> /dev/null | grep Location | awk -F:\  '{print $2}')"
  31.                         curl $_this_url > $_this_save 2> /dev/null
  32.                 }
  33.         done
  34. done

Replies to wenku8_fetch_se rss

Title Name Language When
wenku8_fetch_te NAT bash 6 Years ago.

Reply to "wenku8_fetch_se"

Here you can reply to the paste above

captcha