Sunday, February 7, 2010

Assigning Variables


It only just occurred to me that you might end up wanting to store data eventually. This should help you get through Addition in Starter Problems 1 on ORAC.

In order to manipulate data, you first have to store it somewhere. We do this by assigning the data to what's known as a variable (which is essentially a block of memory). Before we can utilise these variables we must tell the computer to allocate space. This is called declaring variables.

Declaring A Variable

Everything on a computer is declared in terms of bits and bytes. All these bytes are generally stored inside your RAM, although using things like printf(); you can also write to your Hard Drive space.
There are trade-offs with both ideas - for instance, you only have a very limited amount of RAM; computers range between 128MB to roughly 2GB of RAM, and 4GB is the limit for any 32-but system when it comes to RAM.
On the other hand, most computers have virtually infinite hard drive space (1TB hard drives aren't worth that much anymore.....), but reading from and writing to files takes up far more time than reading or writing to RAM.
As such, when it comes to writing C or C++ programs, you need to create as much of a balance as is possible. Depending on what you're doing, it's fine to use 50MB or 100MB of RAM, but you shouldn't be choking down Gigabytes of the shit. It's just not healthy!

Keeping this in mind, we move to thinking about variables. The cool thing about variables is that they generally only take up a few bytes each, but before you know it this adds up.

As a bit of background knowledge, there are 8 bits to a byte. Each bit in a data type is also the next power of two. To illustrate this, let's take a "char". A "char" is 1 byte; thus 8 bits.
Initially, it looks like this in memory:

00000000

If we want to store the letter "a" in this char (which corresponds to the number 97 in ASCII), our char would look like this:

01100001

Converting this, we walk from right-to-left, and each time we come across a number we increase our power of two. Thus, the right-most number represents whether our number includes 20, the one to the left of this is 21, the one to the left again is 22, etc.
Thus, our number is composed of 1*(20) + 1*(25) + 1*(26) = 1 + 32 + 64 = 97.
(i.e. any number with a 0 in front of it implies 0 * that number, which is just 0).
Usually, unless it's an unsigned data type (explained below), the first bit always refers to the negative sign. Thus,
11100001
signifies -3, and NOT 97 + 27 (for a signed type, it's -1*(2next number in the sequence - the value without the sign bit)).

When it comes to C and C++, we declare our variables simply by typing a valid data type in front of them (the next section will cover these), and our variable name.
For instance, to declare a character, we would type this:

char myCharacter;

This declares a variable that takes up 1 byte in memory, and we have decided to refer to this byte of memory as "myCharacter".
This is the point of variable declaration; for us humans to be able to refer to the location of memory we want. Or something like that, anyway.

Data Types

Disclaimer: These are somewhat architecture-dependent. For most cases across any 32-bit system, these values are correct but for a 64-bit system they are incredibly likely to be utterly wrong. If in doubt, open up a new .cpp file and get it to do this: 'printf("%d\n", sizeof(int));'

a "char": 1 byte.
a "bool": 1 byte, and can only take the value "true" or "false".
a "short" (or "short int", either works): 2 bytes.
an "int": 4 bytes.
a "long" (or "long int"): 4 bytes. There is usually no difference between "int" and "long", unless you're on a 64-bit system.
a "float": 4 bytes. These hold floating point numbers - i.e. numbers that require a decimal point. Unlike an integer, a 4-byte floating point number can hold roughly 3.4 +/- 107.
a "long long" (or "long long int"): 8 bytes.
a "double": 8 bytes. These hold floating point numbers up to 1.7 +/- 10308.
a "long double": 8 bytes. Again, this is 8 bytes on an 32-bit system, but this will probably different on a 64-bit system.

Also, all these can be prefixed with "unsigned".
By default, every type is "signed". This means the very last (or the left-most) bit determines whether or not the number you are storing is negative or positive. If you make a type "unsigned", this last bit now represents part of the number instead of it's sign (hence signed/unsigned).

A Brief example

#include <cstdio>
using namespace std;

int main(){
int y;
char BenIsNotCool;
printf("%d\n", sizeof(BenIsNotCool) + sizeof(y));
BenIsNotCool = 'A'
y = 1337;
printf("%d\n", sizeof(BenIsNotCool) + sizeof(y));
return 0;
}


The output of this example would be this:
5
5
This shows that whether or not a variable actually stores values, it will always take up the size it's allocated (although technically, these variables actually do store values; they're just garbage).

Arrays

So, it's all good and well to declare one or two variables to do one or two things. But what happens if you need ninety variables, or over nine thousand variables, and they're all the same type (or at least, multiple of them are the same type)?
This is where arrays come in handy.
Unlike one variable, which takes up, say 4 bytes, an array takes up a contiguous space in memory. If you declare an array for 10 variables, this takes up 40 bytes. If you declare an array for 100 variables, that's 400 bytes, etc. etc.

To declare an array, we declare it like so:
type myArray[size];
Say we want to declare 100 integers, in an array called JamesSucksDick. We would do this like so:
int JamesSucksDick[100];
This gives us 100 integers to play with, all allocated in a contiguous block.
Of course, this now gives us 100 integers to play with. However, we can't just access them with JamesSucksDick = 10, 11, 13, 11202029, 9001;.
That just wouldn't work.
So how?

Pointers

So, the other day we discussed pointers. Briefly.

Essentially, a pointer does exactly what you'd expect: it points. But unlike a regular variable, which only talks about one specific block of memory, a pointer can point to any block of memory of that same type. Thus, an int pointer can point to any int in memory (careful, this will come up later), but it can NOT point to a character.

So, onwards to pointers.
To declare a pointer, it's as simple as this:
type *pointerName;

This will declare a pointer, called pointerName. If we want to declare a pointer to an integer, it's as easy as:
int *myPointer;

The problem with this is that we cant really do 'myPointer = 4', like we would for a normal variable. You could try this. But it will most likely crash.
See the thing with pointers is that the = sign talks about the address they reference. For instance, 'myPointer = 4' tries to access the 4th block of memory. And that's just going to fail.
So we need something else.
Pointers have two extra things they can use. You also have access to * and &.
When talking about pointers, * is what's called the dereference operator. In other words, * is the value of what the pointer is referring to. Note that this is different from the * used to multiply two numbers together (for instance, 4*4 or a*b) and is different from the * used to declare the pointer.
int *myPointer;
*myPointer = 4;

This changes the value of what myPointer is pointing to, and sets it to 4.
& is the reference operator. Using & talks about what the pointer is referring to.
int some_random_value, *myPointer;
myPointer = &some_random_value;
*myPointer = 4;
printf("%d\n", some_random_value);

Of course, this is probably getting a little confusing, and you're probably wondering what pointers have a use for at all.
When it comes to real-world stuff, they're mighty useful. They allow you to dynamically allocate memory (more on this later) and do proper garbage collection and keep track of forty nine million variables with just the one pointer.
However, we started this discussion whilst talking about arrays. So let's go back to arrays!

Arrays

So just to recap, we declare an array like so:

type array[size];

The cool thing about arrays is that they're just pointers!
"array" refers to the first block out of our size, "(array+1)" refers to the second block out of our size, "(array+2)" refers to the third block, etc. etc.
So, to write to any of these values, we can use the dereference operator (heh, remember that? :))

*array = 1;
*(array+1) = 3;
*(array+2) = 3;
*(array+3) = 7;
printf("%d%d%d%d\n", *(array), *(array+1), *(array+2), *(array+3));


But see, programmers are lazy. Why type "*(array+n)" when you want to modify the nth element of an array. Instead, they use the shorthand notation array[n]. This means exactly the same thing as *(array+n)
It is for this reason that we count arrays from 0. i.e., to access the first element (which theoretically lives at *(array+0)), we use array[0] = something.

So, using this newfound knowledge:

array[0] = 9;
array[1] = 0;
array[2] = 0;
array[3] = 1;

Note that the one caveat here is that you must make sure that the element you access is always less than the size that you declare. If you declare something like so:
int array[10];
you declare an array with 10 elements. This means that you can access array[0], array[1], ..., array[9] (count from 0 using your fingers; if you sum up from 0 to 9, you'll get 10), but accessing array[10] is like accessing a variable that doesn't exist, with one minor difference.
If you access "someVariable", and you haven't declared this variable, your compiler will yell at you and your code won't compile. If you access array[10] (i.e. the 11th element) and you've only declared an array with 10 elements, it will compile just fine, but when it comes to runtime you could be doing some seriously dangerous stuff).

Scopes

Every variable has a certain area in which it exists. These areas are referred to as 'scopes'. In total, there are five diferent types of scopes. However, only two of these require your immediate attention.

Global and Local



int x = 1;

int main(){
int y = 2;
}


If you look above, you'll notice that I've declared x outside of main() and y within it. 'x' is what is known as a 'global' variable. It means that it is valid throughout the entire file and is declared before the execution of main(). 'y', however, is what is known as a 'local' variable and only exists after the program has reached a point where it has been declared. Locals are also restricted to the block of code (including the blocks enclosed within said block) in which they are declared.

Note: A block of code consists of code that lies between two curly brackets ( { } ). For instance:


int main(){
int x;
{
int y;
y = 8;
}
x = 4;
printf("%d %d\n", x, y);
return 0;
}
will fail at compiling, as "y" doesn't exist after the }


There is one difference that scope makes.
For a moment, let's pretend that you have two separate parts of RAM. You have "automatic storage" (which is also commonly called "the stack") and you have "free storage" (often called "the heap"). These fulfill two different jobs. Automatic storage is designed to briefly hold variables and references - for instance, somewhere inside the Automatic storage is a reference to the "main()" function.
Free storage holds the rest of things. Large variables, dynamically allocated memory (which we'll cover at a later date) and most RAM-intensive operations should be performed here.
In terms of comparisons, free storage is slightly slower to create than automatic storage. It's just as fast to access, however the creation is slightly slowed. This has to do with the way a CPU works, and so we're not going to explain it here, however note that this is measured in terms of ten-thousandths of a second. In return, you get roughly 4MB of stack memory to play with, as opposed to heap, which is roughly RAM size minus 4MB.
Of course, these aren't actually two separate parts of RAM; they're one and the same, but relate to the way your program uses the available memory. Thus, be aware of them.

Now, how does this effect you?
Well, anything declared inside a function, as well as the declaration of the function itself, is declared on the stack (i.e. automatically stored). This includes all variables and arrays you declare. As such, you should be very careful with this, because you're limiting yourself to 4MB of data (unless you dynamically allocate, but more on this at a later date).

Anything declared globally is placed on the heap. "Globally" refers to any variable declared outside all functions


int myArray[1000]; // declared on the heap, because it's global
int x; // declared on the heap

int main(){ // there is a reference to main() that is declared on the stack
int someInt; // declared on the stack
char penis; // more stackwork.
return 0;
}


Anyway, 'til next time!
James and Ben (mostly Ben).

Compliments to my friend Georgie Brooke for the photo.

No comments:

Post a Comment