BiliBili Followers crawl

From NAT, 6 Years ago, written in Erlang, viewed 769 times.
URL https://code.nat.moe/view/7785d1ac Embed
Download Paste or View Raw
  1. #!/bin/bash
  2.  
  3. # BiliBili Anaylzer: Analzye the relationship between bilibili users
  4.  
  5. # Start from
  6. START_UID=4851356
  7. # Extention of temp files
  8. TEMP_EXT=.tmp
  9. # Extention of the crawlwd mark
  10. CRAWLED_EXT=.crawled
  11. # Save relation to
  12. OUTPUT_FILE=bilibili.prolog
  13. # Start from page
  14. PAGE=1
  15.  
  16. function crawl {
  17.         while true
  18.         do
  19.                 [[ ! -z $1 ]] && CURRENT_UID=$1 || CURRENT_UID=$START_UID # If parameter is given, use it as start UID
  20.                 echo "[$(date)] Crwaling $CURRENT_UID..." # Print status
  21.                 curl http://space.bilibili.com/$CURRENT_UID/fans.html?page=$PAGE 2> /dev/null > $CURRENT_UID$TEMP_EXT # Get the page of follwers of current page
  22.                 DATAS=$(cat $CURRENT_UID.tmp | grep http://space.bilibili.com/  | awk -F\" '{print $6}' | grep html | awk -F/ '{print $4}') # Parser the page and get the UID of the followers
  23.  
  24.                [[ $DATAS == "" ]] && return 0 # No followers? Possiblly the last page. Or... we meet a poor guy
  25.  
  26.                for DATA in $DATAS
  27.                do
  28.                        [[ ! -z $(cat $CURRENT_UID$CRAWLED_EXT 2> /dev/null | grep $DATA) ]] && return 0 # Already crawled? Okay, let's get away.
  29.                        echo "follow($DATA,$CURRENT_UID)." >> $OUTPUT_FILE # Write the follower data to output
  30.                        echo $DATA >> $CURRENT_UID$CRAWLED_EXT # Mark as crawled.
  31.                        crawl $DATA # Crawl this follower
  32.                done
  33.  
  34.                nextpage # Done? Go next page.
  35.        done
  36. }
  37.  
  38.  
  39. function nextpage {
  40.        echo $PAGE+1 | bc
  41. }
  42.  
  43. rm *$CRAWLED_EXT 2> /dev/null
  44. rm *$TEMP_EXT
  45.  
  46. $*

Replies to BiliBili Followers crawl rss

Title Name Language When
BiliBili Followers crawl NAT bash 6 Years ago.

Reply to "BiliBili Followers crawl"

Here you can reply to the paste above

captcha