Monday 31 March 2014

Week 11 of CSC148: Sorting and Efficiency

Here we are, the last of the mandatory SLOG posts and the last week of CSC148 as a whole. Today, I'll be sharing a little summary of what I have learnt so far about sorting and the efficiency of algorithms.

Sorting is something I became quite familiar with in CSC108. In both this course and the latter, sorting has commonly referred to organizing a list of n amount of integers from smallest to largest. For example, an unsorted list of n = 5 elements may looks like this: [2, 3, 4, 1, 5], which after running through the sorting algorithm would come out looking nice and sorted like this [1, 2, 3, 4, 5]. There's a plethora of sorting algorithms that have been created to handle this seemingly simple task and each one of them uses a different (or at least a variation on pre-existing) method. Bubble sort, for example, goes through each element of the list and compares it to the element to its right. If this element is smaller, both elements switch places. This algorithm keeps 'passing' over the list until every element is sorted. Another sorting algorithm, insertion sort, works by creating a sub-list starting from the beginning of the list and "inserting" each new value into its proper spot until the len(n) == len(sublist).

I remember asking myself the following question when first encountering sorting algorithms: if the task is always the same, why are there so many sorting algorithms? Well, put bluntly the task is not always the same. Every sorting algorithm has its strengths and weaknesses, some are just plain slow, and some are just plain fast. Timsort for example, as we saw in the lab last week is blazing fast in virtually every case, there's little wonder it is the default Python list sorting built-in. This is where efficiency comes in. On a small n, efficiency is of little concern. But when n increases to larger and larger values, which it tends to do in CS, especially when dealing with large amounts of data, efficiency becomes a critical concern. In CSC148, algorithm efficiency is commonly analysed on a worst-case basis. That is, the situation where the list is organized in a way that exposes any conspicuous inefficiencies in the algorithm. Big-oh notation is also used, which is the upper bound on the growth rate of the function ignoring any constants (i.e. 3n^2 is O(n^2). For example, a few weeks ago Danny was showing us Quicksort lecture. In the typical case it was quite fast, with O(n log n), but when run on an already sorted list, efficiency dropped to an abysmal O(n^2), which is the worst case for Quicksort. Any runtime near O(log n) is quite fast, n is being halved every time the algorithm makes a pass meaning even massive values of n are reduced quite quickly. On the other end of the spectrum, O(n) is quite a poor runtime, which also happens to be the runtime of bubble sort because, as the algorithm was described, it must look at every item in the list even in its best case. 1,000,000 items would require 1,000,000 times the amount of time as when n = 1--ouch!

The amount of time required for a function run one time is also an important factor in efficiency since this is the number that is multiplied by the big-oh runtime to calculate how long it will take a function to serve a specific purpose on a specific machine. To maximize efficiency in this regard, it is important to observe that there are no redundancies at any step within the code. To bring up another example from the lab, a decrease in runtime was noted when storing the length of n in a variable rather than calling len(n) multiple times within a loop. Given that both processor speed and time are valuable resources, efficiency is indeed an important aspect of algorithm design in programming.

Hope you enjoyed this post on sorting and efficiency!

Monday 24 March 2014

Week 10 of CSC148: Assignment 2 Part 2, LinkedLists

Well, it's hard to believe we are already at the second last week of CSC148. I remember the first post on this SLOG as if it was yesterday...

The majority of my week 10 was spent on the second part of assignment 2 and, overall, I felt it was a very useful assignment to have done. I felt like I had both solidified my skills with writing recursive functions as well as my understanding of tree abstract data types. After all, the special regex trees we were working with were basically just binary trees with a bunch of added restrictions.

All four of the functions (and in my case, an added helper function) covered one aspect of regex tree ADT: building them, breaking them back down, and checking their validity. Of course, not every part of the assignment explicitly had us working with the regex tree ADT, but a big part of the assignment forced us to view a regex expression string like '(1.0)*' as a tree-type data structure in itself.

I felt the biggest challenge with the assignment was the regex_match function in particular. Not because it was particularly difficult to write, but more so because it had so many different "moving parts". This is the first big function in CSC148 that I've had to write that I felt it necessary to test over and over again to see if I had every little outlier case working...and with good reason too! It was tedious but testing this function vigorously would, not often but every little while, produce some kind of tiny error in the "moving parts" that had to be tweaked. Working on this function gave me a really good first hand look at the importance of testing to writing code.

Another thing that's been floating around in my mind, and not just because it has to be due to test 2, are LinkedLists. They are seemingly simple at first glance, and appropriately so, since they are much simpler that trees in terms of their structure, but implementing them with other ADTs has been a little bit of a hassle in the tutorials so far. So for the next week, LinkedLists will be a big priority for me.

Hope you enjoyed this post!

Saturday 15 March 2014

Week 9 of CSC148: BSTs, Runtime, Efficiency, and Exercise 3

So it seems this was the final week on tree data types as we transitioned from binary search trees into runtime and efficiency of algorithms, and ultimately the idea of big-oh, which I'll save for another week once we've done more work on it. Binary search trees, although a new concept, was more of a modification on something that I understood well at this point, and if anything, was a simplification of these concepts. The idea is that for every parent and its children, the left sub-tree must contain a smaller value than the integer value of the parent while the right sub-tree had to a have larger value. This is an expansion of something that was taught in CSC108, and actually made the concept a lot clearer for me. I believe the main thing to take a way from this ADT is that every time the algorithm is run, the size of the pool of possible values is cut in half. Contrast this to a regular Tree ADT, one where there is an integer at every node without the restriction on the left and right children. In this case, the algorithm must search the entire tree in the worst case, a linear runtime. And as we learnt, linear runtime is not a runtime you want to be dealing with! 

As for E3, I had a little bit of difficulty with the functions we had to write but in the end I was very happy we were tested on this material. Being forced to grind out some work on recursion really helped me get to a point where I now feel very comfortable with recursion. I finished the exercise fairly quickly with some clunky code but decided to use recursion to its full potential and I ended up with some clean code I was very proud of. A big part of this was remembering a little aid I had read somewhere for writing recursive functions, I believe it was in the readings for the intro week on recursion. The trick is to start writing a function for every possible size of the input. So, start with the base case, of course. Then go on to writing a function that deals with the case where there are two of whatever you're dealing with, and so on. Hope that tip helps anyone that may be reading!

Now on to the second part of assignment 2...

Thursday 6 March 2014

Week 8 of CSC148: Assignment 2, LinkedLists, and Other Recursive ADTs

If I had to pick one word to summarize the last few weeks of CSC148 I would, without a doubt, pick the word "trees". All of the recent labs, assignments, exercises (E3 was just posted and, from a quick glance, looks like it involves tree traversals), and lectures have had to do with trees or some tree-related data structure, or more specific, nested ADTs like LinkedLists. Therefore, it seems only fitting to discuss the topic a little bit this week. Since linked lists are essentially just trees with an arity of one at every "branch" the same realm of logic applies throughout.

An important part of understand trees (or Trees, to be more accurate as were talking about a Python class), is understand tree traversals. There is pre-order, in-order, and post-order traversal, 3 algorithms which involve "visiting" the node before "moving" left then right, visiting it after going left but before going right, or visiting it after going both left and right, respectively. In exercise three we are required to take a in-order and pre-order traversal list and produce the corresponding tree, which is what I'm starting to think about now.

A part I have found interesting about nested structures while working with them so far is that their methods are often much more simple and elegant than you think their implementation might be like at first glance. I experienced this head-on in the last lab where we were required to write a few functions for the LinkedList ADT. My partner Brian (check out his great SLOG here) and I had gone the iterative route when designing our functions, only to realize later that our clunky code was much longer and more complicated than the recursive alternative to tackling the problem. It was definitely a good lesson, though. Another thing we realized, and something I had noticed on the assignment 2, was that often-times when dealing with linked structures like Trees (whether binary or not) you can find yourself writing recursive functions without even realizing. For example, if you write an __eq__ method for a Tree class, one option is writing it out as so:

return (self.value == other.value) and (self.child == other.child)

I hadn't noticed it at first, but by writing out this method I had written a recursive function. Since in many cases the child may be another Tree, the __eq__ will call itself again, and this will continue all the way until child refers to None in both. I found this very interesting.

Back to finishing up assignment 2 and time to start exercise 3!

Sunday 23 February 2014

Week 7 of CSC148: Recursion

I can't believe we are already 7 weeks into this semester, and week 7, known to many students as Reading Week, is where I focus on the topic of recursion. I've touched on recursion before, it would have been difficult to avoid it since it was a big part of assignment 1, and one of the larger overall topics of the course so far.

Put simply, recursion occurs in computer science when defining a function that calls itself somewhere in its own function definition. A function with this characteristic is aptly referred to as a recursive function. The goal of doing this is breaking a seemingly large, possibly even infinite task into its most basic steps. A good way of understanding how recursion approaches a problem is to contrast it with a style of programming we became familiar with in CSC108, iteration. A simple example of an iterative function would be a function that continues appending to a list while a condition is not met. Both iteration and recursion try to break problems down into smaller tasks, but as far as my understanding goes, iterative functions tend to explicitly declare when it has reached a specific end while recursive functions seem to be more focused on breaking down the problem into its most simple form--referred to as the base case. The base case is the true core of a recursive function in that calling a recursive function on the base case will always lead directly to the return statement. If a more complicated case is called into the recursive function, the goal is to break it down into the base case in as many steps as is required. This 'breaking down' occurs by checking if the base case if fulfilled, if it isn't, the input is modified in some way and the function is called again until either the base case has been reached or the input has been simplified so it is one step closer to it.

As an example of a recursive function, here is one from one of the labs on recursion that my lab partner Brian (check out his entry on recursion right here) and I wrote:

def find_num(SL: 'SearchList', n: int) -> bool:
    """
    Return True if n is in SL, otherwise False.
    """
    if n == SL[0]:
        return n == SL[0]
    elif n < SL[0] and SL[1]:
        return find_num(SL[1], n)
    elif n > SL[0] and SL[2]:
        return find_num(SL[2], n)
    else:
        return False

The point of this function is to recursively search a SearchList, an ADT we learned about in the lab. A SearchList has some interesting characteristics. First, a SearchList is either None or a python list of three elements. Element 0 is always an integer, element 1 is either a None or a SearchList containing integers smaller than element 0, and element 2 is either None or a SearchList containing integers larger than element 0. For example, a SearchList can look like this: [5, [2, [1, None, None], [3, None, None]], [6, None, None]].

Using these specific characteristics, the previous function was written to recursively scan a SearchList for a given integer n. The first if statement checks for the base case, which is the situation where the first element is the element we are looking for. This is a requirement for any recursive function, and is a good way of displaying how a function defines the simplest possible task. For example, calling find_num(SL, 9) on a SearchList defined as [9, None, None] would evaluate to True without calling itself again. An example of a more complex case is the SearchList [8, [2, None, None], None]. It is immediately apparent that this is not the base case since there is a nested SearchList as element two. When find_num is called on this SearchList for n = 2, the base case is checked and will evaluate to false. Next, elif n < SL[0] and SL[1] will be evaluated. n is less than SL[0], which is 8, and SL[1] will evaluate to True since it is not None, meaning there exists a nested SearchList for the function to check for the intended value. This is where recursion happens, inside this elif statement the function calls itself, this time with the nested SearchList in the parameter (this is indexed so it is always re-evaluated). In this second level, the function will once again check for the base case, and this time it will return True. This return will be carried up to the 'first level' of the function and will end the function call.

This might not be the simplest possible example but I believe it effectively shows every key feature of a recursive function: the check for the base case, the return statement(s) for when the function has or has not reached a base case, the function calling itself with a modified parameter, etc. Regardless of how many nested SearchLists there are, this function will continue to call itself until a base case has been found, no matter how 'deep' it must go. A lot of power in a relatively simple function like this seems to be recursions biggest strength and whenever a problem can be broken down into the same series of steps it is definitely a viable option.

Friday 14 February 2014

Week 6 of CSC148: Assignment 1 pt. 2 and Trees

Last week I had ended my post off before I tackled the recursive bits of assignment 1. I correctly predicted it would be the hardest part of the assignment but in hindsight (like everything) it seems easy. As they say, hindsight is 20/20. Once I had figured out how to translate the mathematical equation for the minimum number of moves of a four stool version of the Tower of Hanoi into Python code, which wasn't too bad after Danny had offered some guidance in one of his lectures, I ran into a wall when it came to return the 'i' value that corresponded to each 'n' value, 'n' being the number of disks--or in this cases cheeses--in the game. I solved this by putting my recursive function inside another function so the tuples I was appending too weren't being wiped every time the function recursed. Looking back I could have used another tool recently brought up in class: namespaces, or more specifically, defining the scope of the list I was using to store the tuples outside of the function (i.e. global perhaps). When all of this was said and done, creating the recursive function to actually move the four cheeses was a breeze, once again due to the guidance offered by Danny and the TAs over Piazza. Overall, the assignment went great.

The latest concept to have come up in class was the use of tree structures to help visualize recursion. The 'paths' from node to node and the ideas of parent and children nodes all stemming from a root definitely redefined how I view recursion. I could see immediately why we were being taught this concept, it seems intuitive to me almost immediately. Understanding the code and the tree ADT as a whole is the next step for me right now in understanding the CSC148 course material, particularly now that my midterm studying is about to begin.

Saturday 8 February 2014

Week 5 of CSC148: Assignment 1 and Exceptions

I'm getting close to the end of assignment one, which incidentally is where it gets very challenging. The TOAHModel and ConsoleController modules weren't too hard to complete, in fact it was easy enough to have a little fun while doing it. I went a little further than required with ConsoleController and added a function that checks if the user has won the game, rather than make it an entirely manual experience. I did this so there is some redeeming quality to playing around with the TOAH game in the console while I figure out how to complete the hardest step, number five. This is where recursion pops up in the assignment, and from what I have glanced over and heard about in class it is no easy form of recursion. The assignment handout gives us a nice recursive formula to guide is in the right direction (picking the value of i that results in the least number of moves to complete a four stool version of the game). I imagine I must implement the code given to us in class for the three stool version of the game at some point.

One of the best things we have learnt so far in my opinion is how exceptions are raised and can subsequently be handled. No more annoying exceptions popping up left and right when we don't want them to! I used a couple of try/except statements in the assignment so far and it's made my life a lot easier. Not only can I stop specific (or general) exceptions from showing up, but I can totally change what happens when a specific one is raised. This is definitely something I will use a lot in my programming.

Now time to try and tackle that recursive formula for assignment one...