Re: SiteLinkMap

From NAT, 6 Years ago, written in Bash, viewed 628 times. This paste is a reply to SiteLinkMap from NAT - go back
URL https://code.nat.moe/view/7f0fc2ee/diff Embed
Viewing differences between SiteLinkMap and Re: SiteLinkMap
#!/bin/bash

TABS=0

temp=$(mktemp -d)
cd $temp

GLOBAL_IGNORE='wordpress|twitter|facebook|baidu|google'

[[ -z $MAX_DEPTH ]] && MAX_DEPTH=50

function getLinksByURL {
        curl --max-time 10 -sL "$1" | awk -F'href=' '{print $2}' | awk -F'>| ' '{print $1}' | sort | uniq | grep 'http' | awk -F"/" '{print $3}' | tr -d "\"'" | sort | uniq
}

function showTabs {
        for i in $(seq 1 $1)
        do
                echo -n "  "
        done
}

function showSiteLinkOut {
        let TABS++
        mkdir "$1" 2> /dev/null; cd "$1"
        getLinksByURL "$1" | grep -vE "$1|$DONE" "$1|$(cat $temp/DONE)" | while read -r out
        do
                DONE="$DONE|$1"
                
echo -n "|$1" >> $temp/DONE
                
[[ $TABS -lt $MAX_DEPTH && -z $(grep -E "$DONE" "$(cat $temp/DONE)" <<< "$out") && ! "$1" == "$out" && -z $(grep -E "$GLOBAL_IGNORE" <<< "$out") && ! "$1" == "$out" ]] && echo "[$TABS]$(showTabs $TABS)Getting into "[$TABS - $(pwd | awk -F"$temp" '{print $2}')] >>> $out" && showSiteLinkOut "$out" && echo "[$TABS]$(showTabs $TABS)Getting out of $out" &&let let TABS-- && cd ..
                echo 0 > /dev/null
        done
}

echo "working at $temp"
echo ">>> TRACE to $1 START"
DONE=$1
echo -n "$1" > $temp/DONE
showSiteLinkOut $1
cd $temp
tree -d $1$1 > "$(dirname "$0")"/tree.txt

Replies to Re: SiteLinkMap rss

Title Name Language When
Re: Re: SiteLinkMap NAT bash 6 Years ago.

Reply to "Re: SiteLinkMap"

Here you can reply to the paste above

captcha