The question was to find all the emails of the given site. The initial page of the site showed just a link, a link to another page. That page contained 2 links both leading to 2 different pages. From that point on wards each page showed 2 different links to 2 other pages and so on. The 11th level of page showed one email id.
From this style of information, we could deduce that the arrangement is similar to that of a binary tree. Since the page at the 11th level showed the email address, there would be 2^10 ie., 1024 email ids. So we needed to design a version of binary tree traversal algorithm to get all the email ids.
So our objective is as follows :
From this style of information, we could deduce that the arrangement is similar to that of a binary tree. Since the page at the 11th level showed the email address, there would be 2^10 ie., 1024 email ids. So we needed to design a version of binary tree traversal algorithm to get all the email ids.
So our objective is as follows :
- Send a request
- Analyse the response
- If it is an Email, append it to a file
- If it contains links, push them to a stack
- pop the stack, and repeat the tasks until the stack is empty
I chose python to write the code for this. Requests module was used for sending requests and receieving responses. To parse the html response, I used a module called Beautiful Soup. Python just rocks you know.
import requests
import BeautifulSoup
link = "http://hackcon14.cloudapp.net:8080"
linkstack = ['/']
def checkIfEmail(data):
if data[-12:]=='@hackcon.com':
return True
return False
def emailharvest(link):
r = requests.get(link)
bs = BeautifulSoup.BeautifulSoup(r.text)
data= bs.findAll('a')
for x in range(0, len(data)):
isEmail = checkIfEmail(data[x].contents[0].encode('ascii','ignore'))
if isEmail :
print "Found an email : " + data[x].contents[0].encode('ascii','ignore')
f = open('emails.txt', 'a')
f.write(data[x].contents[0].encode('ascii','ignore'))
f.write('\n')
else:
print "Found a link : /Pages/" +data[x].contents[0].encode('ascii','ignore')
linkstack.append('/Pages/'+data[x].contents[0].encode('ascii','ignore'))
def doit():
while(len(linkstack)!=0):
linkpart = linkstack.pop()
print 'Going to visit ' + linkpart
emailharvest(link+linkpart)
Well, That did the trick, gave me 1024 emails on a text file. Uploaded it and Voila, 80 points to my team, xbios.
No comments:
Post a Comment