The question was to find all the emails of the given site. The initial page of the site showed just a link, a link to another page. That page contained 2 links both leading to 2 different pages. From that point on wards each page showed 2 different links to 2 other pages and so on. The 11th level of page showed one email id.
From this style of information, we could deduce that the arrangement is similar to that of a binary tree. Since the page at the 11th level showed the email address, there would be 2^10 ie., 1024 email ids. So we needed to design a version of binary tree traversal algorithm to get all the email ids.
So our objective is as follows :
From this style of information, we could deduce that the arrangement is similar to that of a binary tree. Since the page at the 11th level showed the email address, there would be 2^10 ie., 1024 email ids. So we needed to design a version of binary tree traversal algorithm to get all the email ids.
So our objective is as follows :
- Send a request
- Analyse the response
- If it is an Email, append it to a file
- If it contains links, push them to a stack
- pop the stack, and repeat the tasks until the stack is empty
I chose python to write the code for this. Requests module was used for sending requests and receieving responses. To parse the html response, I used a module called Beautiful Soup. Python just rocks you know.
 
import requests
import BeautifulSoup
link = "http://hackcon14.cloudapp.net:8080"
linkstack = ['/']
def checkIfEmail(data):
    if data[-12:]=='@hackcon.com':
        return True
    return False
def emailharvest(link):
    r = requests.get(link)
    bs = BeautifulSoup.BeautifulSoup(r.text)
    data= bs.findAll('a')
    for x in range(0, len(data)):
        isEmail = checkIfEmail(data[x].contents[0].encode('ascii','ignore'))
        if isEmail :
            print "Found an email : " + data[x].contents[0].encode('ascii','ignore')
            f = open('emails.txt', 'a')
            f.write(data[x].contents[0].encode('ascii','ignore'))
            f.write('\n')
        else:
            print "Found a link : /Pages/" +data[x].contents[0].encode('ascii','ignore')
            linkstack.append('/Pages/'+data[x].contents[0].encode('ascii','ignore'))
def doit():
    while(len(linkstack)!=0):
       linkpart = linkstack.pop()
       print 'Going to visit ' + linkpart
       emailharvest(link+linkpart)
            
Well, That did the trick, gave me 1024 emails on a text file. Uploaded it and Voila, 80 points to my team, xbios. 
 
No comments:
Post a Comment