Thursday, August 30, 2007

STL Introduction


Iterators are an abstraction of pointers. They provide a means to traverse containers and access objects within containers.

The generic algorithms are a collection of routines that can be used to perform common operations on container classes, such as sorting, merging and finding. The are called generic because they are independent of the container class used and of the datatype of the objects held in those container classes. The iterators are used to link a container with an algorithm. A container class has methods to return iterators that provide access to its contents. These iterators are provided, as arguments, to an algorithm. The algorithm can then traverse, access and manipulate the contents of the container.

There are a few benefits that the container classes offer over arrays.

  • Container classes handle all sizing and memory allocation issues. You can't write past the end of a container class like you can with an array
  • Container classes are optimized for their intended use. Some allow constant time insertions at their end, some allow this anywhere. Some are better to maintain sorted lists, some better for random access retrievals. We'll learn more on this as we proceed.
  • They contain utility methods such as empty(), size() and clear() that can simplify programming.
  • They contain other methods that all insertion, deletion and copying.
  • They have equality, inequality and assignment operators. You can directly compare two containers or assign on to another.
  • They provide iterators. Iterators are an abstraction of pointers. They allow similar syntax and use, but also have some built-in methods and checks that increase their utility.

The Big-Oh

Before we begin to discuss containers further and begin to discuss the other important components of the STL, such as the generic algorithms, it is necessary to understand a little about quantifying computational costs. We need a measure to understand how an algorithm behaves. Suppose, we run an algorithm on a container containing 5 objects during testing and now we intend to run this algorithm on a container containing 50, 500 or 5 million objects. How long will it take? How do we quantify this? Suppose, we retrieve the first element in a container. How long does it take to retrieve the 79th? The same amount of time? Seventy-nine times as much time? The answers to these questions depend on the particular container or particular algorithm, but there are standard ways of expressing and quantifying these relationships. The computational cost of an algorithm or operation is generally expressed using big-Oh notation. I'm sorry to disappoint any of my readers about the sub-title. The "Oh" indicates the order of the operation. It may be constant, or increase in some other way as the size of the container or number of elements being manipulated increase. The "big" in "big-Oh" indications that we are interest in approximate or order of magnitude estimates. If it takes twice as long to retreive element 79 as element 1, this is still considered a constant time. That's close enough. We're concerned that the retrieval time does not increase quadratically (as a square or power of 2) or cubically (power of 3) for instance. If that were the case the operation would require on order of 79^2 = 6241 or 79^3 = 493039 times the number of operations as the retrieval of element 1.

  • Constant time, O(1). The cost of the operation is independent of the size of the container or number of elements. As noted above, it may not really be constant, but to a resonable approreasonable it's constant.
  • Linear time, O(N). The cost of the algorithm or operation increases linearly as the number of elements increases. For example, the cost to retrieve element 79 is roughly 79 times the cost to get element 1.
  • Quadratic time, O(N2). The cost of the algorithm or operation increases as the square of the number of elements. This is the case for some crude sorting algorithms.
  • Logarithmic time, O(log N). The cost of the algorithm or operation increases as the log of the number of elements. As a review, logbasenumber = exponent. For example, log10100 = 2 and log1000 = 3. For decimal numbers, logs are often written without explicitly showing the base. For instance, log10X is usually written as log X. You should note that for many algorithms that work by successively dividing the number of elements in half, such as many tree based algorithms, the cost actually increases as log2 N. The cost, however, is usually expressed as O(log N) = O(log10N). Why, because logarithms in different bases of the same number are related by a constant. For now, just remember that operations that increase logarithmically increase more slowly than those that increase linearly.
  • O(N log N). The cost increases on the order of N log N. This is true for some algorithms. For instance, the Fast Fourier Transform, FFT, beloved by electrical engineers world wide, has this property.

How do all these orders relate to each other? Let's look at some examples

Number of elements

O(1)

O(log N)

O(N)

O(N log N)

O(N2)

10

constant

1

10

10

100

100

constant

2

100

200

10000

1000

constant

3

1000

3000

1000000

10000

constant

4

10000

40000

100000000

As you can see, as the number of elements increases, the order of an algorithm becomes very significant.

Sequence Containers

The STL provides both sequence and associative containers. The sequence containers are meant to hold a collection of objects of one type in a linear sequence. This is similar to an array. The STL provides three sequence containers.

  • vector - The vector class provides random access retrievals of objects at any position in constant time. This is useful if your code needs to access elements in an unordered way. You need element 1, then element 5, then element 8752 and so on. The cost (cpu clocks or time needed) of getting element 8752 can be considered to be the same constant as the cost to retreive element 1, for instance. Vector is optimized for insertions and deletions at its end. These are also done in constant time. Insertions or deletions at positions other than the end require linear time.
  • deque - The deque (pronounces deck) class also provides random access retrievals of objects at any position in constant time. Insertions and deletions at both the front and end of a deque are optimized and occur in constant time. Insertions and deletions in the middle require linear time.
  • list - The list class does not support random access retrievals. Retrievals require linear time. This is because lists are implement with an underlying doubly linked-list structure. A linked-list is a series of nodes connected by previous and next pointers. The "previous" pointer gets you to the previous element; the next pointer to the next. Insertions and deletions at any point are efficient and occur in constant time. Insertion or deletion requires onlreassignmentnt of the next and previous pointers. No elements must be moved. Insertions and deletions in the middle of vectors and lists and at the beginning of vectors require other elements to be moved. This is why they require linear time.

In addition to the containers list above, ordinary arrays and the string class can be considered sequence containers. As will be shown later, they can also be used with the generic algorithms that provide much of the usefulness of the STL. In addition, most STL implementations also provide stack, queue and priority queue container classes. These classes are created using container adapters and one of the core STL sequence containers. We explore this in a later lesson.

Generic Algorithms

The generic algorithms provide functions that implement many common programming tasks in efficient manners. We'll explore these algorithms in detail later in the tutorial. For now, I'll just list them so that you can see the wide range of algorithms available. Many of the algorithms have overloaded function signatures. That is, they have versions that take different number and/or types of arguments. Here is a list of the algorithms. You can surmise their use from their names. The suffix "if" on some of the method names indicates that the method takes a function object as an argument. We'll see more on this shortly. The suffix "copy" indicates that the method copies its results into a destination container.

adjacent_find

binary_search

copy

copy_backward

count

count_if

equal

equal_range

fill

fill_n

find

find_end

find_first_of

find_if

for_each

generate

generate_n

includes

inplace_merge

iter_swap

lexicographical_compare

lower_bound

make_heap

max

max_element

merge

min

min_element

mismatch

next_permutation

nth_element

partial_sort

partition

pop_heap

prev_permutation

push_heap

random_shuffle

remove

remove_copy

remove_copy_if

replace

replace_copy

replace_copy_if

replace_if

reverse

reverse_copy

rotate

rotate_copy

search

search_n

set_difference

set_intersection

set_symmetric_difference

set_union

sort

sort_heap

stable_partition

stable_sort

swap

swap_ranges

transform

unique

unique_copy

upper_bound

The point to take away is that there are many algorithms available and that before writting code, check to see if an existing algorithm can suffice.

Function Objects

Many of the generic algorithms on the previous page perform some type of comparison or test as they execute. For instance, the count function returns the number of elements equal to some value. The find function returns an iterator containing the location of the first element equal to some value. If this were the limit to the flexibility of these algorithms they would still be useful, but the STL was designed to make the algorithms significantly more flexible and more useful. Many of the algorithms have versions that take a function or function object as an parameter. Function objects are classes that have the function operator, (), overloaded. These function or function objects usually take one or two parameters and return a boolean, or true/false value. A function that returns a boolean value is also called a predicate. Let's see how a predicate can be used to extend the functionality of the count algorithm.

#include
#include
#include
using namespace std;

bool less (const int i) {
return i < 25;
}

bool more (const int i) {
return i > 25;
}

int main( )
{

// Creates an empty vector
vector v1;

// Add numbers to vector
v1.push_back(12);
v1.push_back(25);
v1.push_back(25);
v1.push_back(35);
v1.push_back(45);

// Finds the number of elements equal to 25
int quantity = count(v1.begin(), v1.end(), 25);

cout << quantity << " elements equal to 25" << endl;

// Finds the number of elements less than 25
quantity = count_if(v1.begin(), v1.end(), less);

cout << quantity << " elements less than 25" << endl;

// Finds the number of elements greater than 25
quantity = count_if(v1.begin(), v1.end(), more);

cout << quantity << " elements greater than 25" << endl;

return 0;
}

OUTPUT:
2 elements equal to 25
1 elements less than 25
2 elements greater than 23

Notice how the behavior of count is modified when a predicate, such as the functions less or more, is provided as a parameter. Instead of a simple function as shown in this example, a class that overloads the function operator, (), could also be used. We will see the benefits of this in a later lesson. Also, please note that the STL provides built-in function objects via the functional include file. This will be covered in a later lesson

Adapters

The last piece of the STL to be introduced in this overview are adapters. Adapters modify the interface or the behavior of other STL components. Adapters can be used to modify iterators, containers or function objects. For example, an iterator can be modified into a reverse iterator that traverses a container in the opposite direction. A vector can be modified into a stack using a container adapter. The behavior and interface of function objects can be changed by function adapters. Let's redo the last example using two binary predicates that are part of the STL, less and greater. Less compares two values and returns true if the first is less than the second. Greater returns true is the second is greater than the first. On the last page we counted values in a vector less that 25 and greater than 25. We want to use these predicates with the "second" value hard coded to 25. To do this using the STL provided less and greater function objects we need to use an adapter, bind2nd, that binds the second value in these function objects to 25. For now, you only need to see how adapters can greatly extend the usefulness of other STL components. We'll handle all the details in later lessons.

#include
#include
#include
#include
using namespace std;

int main( )
{

// Creates an empty vector
vector v1;

// Add numbers to vector
v1.push_back(12);
v1.push_back(25);
v1.push_back(25);
v1.push_back(35);
v1.push_back(45);

// Finds the number of elements equal to 25
int quantity = count(v1.begin(), v1.end(), 25);

cout << quantity << " elements equal to 25" << endl;

// Finds the number of elements less than 25
quantity = count_if(v1.begin(), v1.end(), bind2nd(less(),25));

cout << quantity << " elements less than 25" << endl;

// Finds the number of elements greater than 25
quantity = count_if(v1.begin(), v1.end(), bind2nd(greater(),25));

cout << quantity << " elements greater than 25" << endl;

return 0;
}

OUTPUT:
2 elements equal to 25
1 elements less than 25
2 elements greater than 23

0 comments;Click here for request info on this topic:

Post a Comment