I am trying to implement the code described this post about how Reddit identifies duplicate user names (I do believe it is called binary search). Just to clarify, this is mainly for me figuring out how these things work; thanks @Graipher for pointing to
bisect). The way I understand it, the steps are:
You have a new user name you are comparing to an alphabetically sorted list of existing names
You compare that new name to the middle item in the list.
If there is no match, you check if the new name is higher or lower in the alphabet than the middle item, and focus on this side of the list, again checking the middle item.
You repeat until there is a match ("User name taken") or none ("User name available").
Below is what I have. It seems to work when I test it on the names in the list and random other names. It'd cool if someone had some feedback re:
Will this work correctly or is there some kind of bug?
whileloop the right way to go? Are there better options? I especially don't like the separate loop for a list of length 1.
Performance-wise, are there any real hiccups in here?
Any other comments are much appreciated!
def checker(new_name, old_names): """ The checker takes a string new_name and checks if it is contained in the list old_names. It prints out the results. """ print "***\n\ninitializing the checker" print "searching for ", new_name print "old names", old_names old_names= sorted(old_names) #set the match variable to false namematch= False #iterate over list while new_name does not match while not namematch and len(old_names) > 1: middle= len(old_names)/2 print "middle item is", old_names[middle], "middle index is", middle #match if old_names[middle] == new_name: namematch= True #no match else: if sorted(list((new_name, old_names[middle]))) != new_name: print "word is to the right" old_names = old_names[middle:] else: print "word is to the left" old_names = old_names[:middle] #check for list of length 1: is it new_name if (len(old_names)) == 1 and (old_names == new_name): namematch = True if not namematch: print "this name is not taken", new_name else: print "this name is taken:", new_name
Here is my test data:
names_list= [u'annie', u'max', u'chris', u'11alpha', u'zotti', u'zatti', u'zutti', u'?andy', u'getr\xfc', u'zilly'] checker('max', names_list) # True checker('fail', names_list) # False