The question was to find all the emails of the given site. The initial page of the site showed just a link, a link to another page. That page contained 2 links both leading to 2 different pages. From that point on wards each page showed 2 different links to 2 other pages and so on. The 11th level of page showed one email id.
From this style of information, we could deduce that the arrangement is similar to that of a binary tree. Since the page at the 11th level showed the email address, there would be 2^10 ie., 1024 email ids. So we needed to design a version of binary tree traversal algorithm to get all the email ids.
So our objective is as follows :
From this style of information, we could deduce that the arrangement is similar to that of a binary tree. Since the page at the 11th level showed the email address, there would be 2^10 ie., 1024 email ids. So we needed to design a version of binary tree traversal algorithm to get all the email ids.
So our objective is as follows :
- Send a request
- Analyse the response
- If it is an Email, append it to a file
- If it contains links, push them to a stack
- pop the stack, and repeat the tasks until the stack is empty
I chose python to write the code for this. Requests module was used for sending requests and receieving responses. To parse the html response, I used a module called Beautiful Soup. Python just rocks you know.
import requests import BeautifulSoup link = "http://hackcon14.cloudapp.net:8080" linkstack = ['/'] def checkIfEmail(data): if data[-12:]=='@hackcon.com': return True return False def emailharvest(link): r = requests.get(link) bs = BeautifulSoup.BeautifulSoup(r.text) data= bs.findAll('a') for x in range(0, len(data)): isEmail = checkIfEmail(data[x].contents[0].encode('ascii','ignore')) if isEmail : print "Found an email : " + data[x].contents[0].encode('ascii','ignore') f = open('emails.txt', 'a') f.write(data[x].contents[0].encode('ascii','ignore')) f.write('\n') else: print "Found a link : /Pages/" +data[x].contents[0].encode('ascii','ignore') linkstack.append('/Pages/'+data[x].contents[0].encode('ascii','ignore')) def doit(): while(len(linkstack)!=0): linkpart = linkstack.pop() print 'Going to visit ' + linkpart emailharvest(link+linkpart)Well, That did the trick, gave me 1024 emails on a text file. Uploaded it and Voila, 80 points to my team, xbios.
No comments:
Post a Comment