ivl

ivl step 2 - indexing

We have seen in ivl step 1 - arrays how to construct arrays and ranges of any data type, and use operators and functions to construct expressions. Continuing the main() function of the same source file array.cpp, we explore how expression evaluation is actually implemented, facing elementary [random access] array indexing [subscripting]. We then continue with more powerful indexing schemes, including iterators, range arrays, mask arrays, indirect arrays, and member arrays, many of which are similar or equivalent to schemes of std::valarray. Combinations are possible and even more powerful. A number of array manipulation methods are also discussed in connection to indexed arrays.

Bur first, let us begin a new source file as usual, say index.cpp.

expressions [continued]

We have seen in the previous step how to compute the density of a normal distribution by

with input a and parameters mu, sigma respectively given by

We have also seen the result,

and how inefficient a naive evaluation might be. So let us consider an alternative:

Two comments are in order. First, constructor N(a.size()) allocates space for as many elements as a has, but does not initialize them: N(a.size(), 0.0) would initialize all elements to 0.0 [similarly for any constant of any data type that can be automatically converted to the data type of N], and N(a) would copy a to N. Second, N[i], a[i] refer to the i-th element of N, a respectively [more right below].

Now, the code in lines 3-8 is much more efficient than what we did with temporary arrays: it allocates space once, computes what is to be computed, and stores it directly where it is to be stored, without any unnecessary initialization or any other overhead.

The amazing fact is that the underlying implementation of the array expression in lines 1-2 is equivalent to the code of lines 3-8! This is due to lazy evaluation: no actual evaluation takes place until the assignment operator = is encountered! The code transformation is type-safe and is carried out at compile time through template metaprogramming; no macros or any kind of pre-processing are used.

In fact, writing composite expressions instead of storing intermediate results is the preferred way to program in ivl for increased performance, while avoiding cumbersome for loops is exactly one of the main reasons to use ivl.

indexing single elements

In fact, no further example is needed apart from what we have already seen. Referring to a single array element is the most common operation in any kind of array and ivl array has the same syntax as a C array and std::vector: if a is an array<T> of length L, then a[i], for i=0,...,L-1, refers to the i-th element of a, and is of type T& [reference] permitting read or write operations, or const T& [const reference] [read-only].

i is called an index, and is of type size_t, a machine-dependent, unsigned integer; but any integer will do. Much like a C array or std::vector, ivl::array uses zero-based indexing: the first element of the array is indexed by 0.

In our example above [lines 3-8], a[i] is read-only and is used to read the i-th element of a and perform the required computation, while N[i] is read-write and is used to store the result in the i-th element of N. The difference is that we have declared a as const here [this is what we would do if we were to encapsulate $\mathcal{N}$ in a function]. This applies to an arbitrary user-defined type T as well, as long as assignment operator = is defined for T.

Bounds checking refers to enforcing i within range 0,...,L-1 and takes place in ivl only in debug mode, throwing an ivl exception; in release mode, accessing a[i] with i outside range 0,...,L-1 is like accessing an illegal pointer address. The same principle applies to all other array indexing schemes.

iterators [bad]

Let us define an array of three int elements:

ivl fully supports STL-compatible iterators. This means you would be entitled to write

to print out the elements of b squared, without using streaming operator >>

But this is not the philosophy of ivl, since it offers more convenient ways to achieve the same result. [Can you do it in 25 characters?]

iterators [good]

On the other hand, iterators open doors to dozens of existing algorithms, many of which are standard. Although ivl offers alternatives, let us sort the elements of b in ascending order by STL sort:

Ok, this was easy, but how about generating the 6 = 3! possible permutations of the 3 elements of b using STL next_permutation?

range arrays

Consider a long list of numbers, given in a C array:

It is straightforward to represent the list in an ivl array,

implying a copy of the actual data:

Now, what are the elements of c from position 3 to 11, with step 2?

[recall we are using zero-based indexing!]

Nice, now multiply the first 8 elements of c by 2:

indexed / underlying arrays

A range array is what results by applying operator [] to an array, where the index is a range rather than a scalar. The former is called the indexed array and the latter the underlying array. The indexed array is a view the underlying array: it behaves like an array itself, being able to read from the underlying array [if the latter is const] or write as well [respectively if non-const].

We may think of a range array as a range of references to elements of the underlying array. The references are const or non-const, depending on the underlying array.

By "behaving like an array", we mean in particular that indexing operations may be applied in sequence, for instance taking the last two elements of c[3,_,2,_,11] is equivalent to writing c[9,_,11] directly:

[to make this more useful in practice, we shall see that arrays acting as input or output arguments of functions can be easily replaced by range arrays or any type of indexed array as well]

These ideas extend to all types of indexed arrays below, where the index is respectively an array<bool>, an array<size_t>, or a pointer to a structure/class member.

mask arrays

Which elements of c are negative?

No, we have seen that already. I do not want a sequence of bits. I want the elements.

[we have disambiguated ivl::find() over std::find()]

Nice try. But these are the positions of the negative elements. I want the values!

Well done! Now, switching to user defined types, tell me which of the candidates on my list are better than the new candidate, Charlie.

It goes without saying, a mask array arises when the index to the underlying array is an array<bool>. The positions of true values are the positions of the indexed elements in the underlying array.

indirect arrays

This is the most flexible and maybe the most powerful indexing scheme. In fact, our attempt to use find() above was not in vain. Let us recall the positions of negative elements in c:

Recall that the index type in ivl::array is size_t. size_array is a special kind of array, a shortcut for array<size_t>, that is designed to hold positions to be used as indices to another array [of any type]. Let us try it:

Now we know, an indirect array arises when the index to the underlying array is a size_array. But does that mean I am free to choose explicitly the elements I want, even if the elements are not uniformly distributed in the array, or do not satisfy a logical condition?

idx() allows specifying an explicit list of indices much like arr(), but always yields a size_array. It is equivalent to arr<size_t>().

Great! And now that the index is unrestricted, do I still get a subsequence of the underlying array, or can elements repeat in arbitrary order?

[we shall see in multi-dimensional arrays that we can get not only the right number of elements in the right order, but also in the right dimensions]

So what happens now if we modify the indirect array? Does the modification repeat as well?

This side-effect may seem strange at first, but can be very useful e.g. in counting or computing frequencies.

assignment

In all indexing schemes so far, we have either read pieces of an array, or modified them by a compound assignment operator, e.g. adding or multiplying by a value. What else is there? Can we assign new values? Well, we can assign a single value,

or another array of appropriate size,

or an indexed array of the same or a different array, again of appropriate size,

[with every such expression being safe in terms of overlapping elements in read and write operations, although computation is in-place, in general]

or as well an empty matrix denoted by _ , effectively deleting the indexed elements:

We have chosen a range array in the previous examples, but the same applies to all indexing schemes presented so far.

all elements

What if we need to assign a single value to all elements? There is a special syntax for this:

where now _ means "all elements" [forgotten b?]. On the other hand, assigning 100 to the entire array b results in an array of one element:

If you begin confusing the different uses of _ , you need to practice because more are coming!

resize

Related to element deletion is method resize(), similar in many aspects to array construction. For instance, we can allocate more space, optionally giving a default value for the new elements,

or truncate the array keeping only the first elements:

Keep in mind that all operations that modify the size of an array involve re-allocation and hence are far less efficient than in-place operations. If your problem requires excessive manipulation and sizes are not known in advance, then a container using contiguous memory is probably not appropriate; you may consider sacrificing random access and using a different container like std::list. Or, if you can make a safe prediction, you can pre-allocate, either with a constructor [line 3] or with resize().

member arrays

Let Charlie join the company of our candidates:

A new company policy now says that candidates should be ranked according to a weighted score, where the university grade and references count 70% and 30% respectively. What are the scores of our fellows, on a scale of 0 to 100? Easy:

&cand::grade is a pointer to member [of type int] grade of structure cand and, used as in index to the underlying array [array<cand>] list, yields an array<int> where each element holds the grade of the corresponding [cand] element of the underlying array, list. That is, an array of all grades in the same order as candidates. Similarly, list[&cand::ref] is an array of all references.

[Pointers to members are a rarely used and largely unknown feature of C++, but come in very handy in this case, without the complete, weird syntax required to declare a pointer to member variable. Invoking member functions [methods] also has a very convenient syntax for an ivl array of a user-defined structure or class.]

Thinking of array elements in one dimension and candidate members in another, we can see the array of candidates as a table,

namegraderefscore
bob 71079
carol 87 77
ted 59 62
alice 83 65
charlie72 55

where score acts like a virtual column computed from the real ones. While all table rows are of the same type, columns may be of different types. Previous indexing schemes refer to row selection, while member indexing to column selection.

Great, so who is the winner now?

Member arrays differ from the previous indexing schemes in some respects. There are no indexed elements ("rows") for instance, but rather members ("columns"). And because structures and classes have fixed size, no operations like deleting and resizing apply. But we can still assign a scalar quantity or an array of appropriate size, and we can apply different types of indexing in sequence.

enough!

This completes source file index.cpp:

exercises

  1. We have not discussed ivl array construction in general, but we have seen five different constructors already. Can you identify them?
  2. Re-write the code of lines 9-12 in 25 characters.
  3. Print out the grades of candidates that are better than Charlie [according to our original definition], in 47 characters.