Optimizing memory part 2, let’s play Limbo

1-25-2018
Where we last left off, I had trimmed off unwanted functions and was going to start experimenting with moving localized variables to globals. I was going to do the by creating a header file and push all of the global variables to it to keep better track of the globals visually.

My last code adjustment before moving these variables on compiling resulted in a memory footprint of 12728 bytes free.

However, before we get around to moving our variables around, I discovered that in the compiling, we can have Z88dk make a memory usage map.
Open up the fase.bat file and find the compiling string

zcc +zx -zorg=32772 -O3 -vn main.c -o build\main.bin -lndos

and replace it with

zcc +zx -zorg=32772 -O3 -vn -m main.c -o build\main.bin -lndos

Notice the -m in the string, that will create a memory map. Now, run a compile. Finally, look in your /asm directory to find a new file main.map.

Open the file and we are greeted with a huge map file that looks something like this.

ASMDISP_FPUTC_CALLEE            = 0005, G: tmpXXUobq4h
ASMHEAD                         = 8004, G: 
ASMSIZE                         = 2683, G: 
ASMTAIL                         = A687, G: 
DEFINED_ANSIstdio               = 0001, L: zx82_crt0
DEFINED_ministdio               = 0001, L: zx82_crt0
DEFINED_myzorg                  = 0001, L: zx82_crt0

Which goes on for 900 plus lines. It shows us a lot of info, almost too much at one time to comprehend. The next thing I notice is that everything is in HEX, arrgh! This is going to take forever to comprehend.

So, you may have wondered, if I have slowed down over the last 2 weeks or so, this map list is one of the reasons why.

I whipped up a quick C# program to help me with this list. The program is not yet complete but works enough for now. I created a utility to take this map and reformat the data within and convert the hex to decimal automatically.

If you are interested in this utility, you will have to send me a message through my contacts feature in WordPress.

My program looks like this

Here, we can see that the ASMHEAD, which is where your program starts, is at address 32772, the size of the program is ASMSIZE, 9859 bytes. Add those two together and we get the ASMTAIL, the end of my program is 42631. The rest of the numbers show the start of each of the routines, for example, the Joystick routine (_Joystick) is found at address 33005. All of this is important information to have. It is a bit disorganized, but it’s all there. More importantly, I can move variables to a header file and start observing the size of the program and how Z88dk is storing them in the “Stack”.

What is the “Stack” you may ask?

Excellent question, here’s a nice Wikipedia article on the subject

https://en.wikipedia.org/wiki/Call_stack

Basically, it’s a storage area for returning the addresses of your functions back to the main routine. It also seems to store all of your local variables. So the more subroutines you have and the more local variables you have, the larger your stack will grow. Having a stack is extremely useful, but as your program and routines and variables grow, so will your stack.

The other thing to worry about is the program Heap.

Here’s a nice little Wikipedia article on the Heap

https://en.wikipedia.org/wiki/Memory_management#HEAP

The heap is very similar to the stack as it is a storage area for global variables and for dynamic memory allocation.

Both the Heap and the stack are important for your program to run, however as you continue to program, both of these items can grow.

The stack and heap in Z88dk grows from the top of the memory downwards, while your program grows upwards. At some point, the two may overlap. When that happens, your program will either not run or will crash.

This is what I think is happening with my program.

If I use my little program to see what is happening, I can see a large number of items that look similar to this

i_1 9C33, L: s3v0_

That’s 39987 in decimal.

The main.map is filled with variables like this. I need to investigate on these and all of the other variables.

The best way to do this is by visiting each routine and find out what the variable needs are, create a global for each of the variables and then use those variables common between each of the functions.

Remember where we started off, we had a memory footprint of 12728 bytes free.
Let’s play Limbo and see how low we can go.

One of the first things I’m going to call into question is the verticalEdge () function. I’m not rewriting the function right now, but there seems to be a few waste areas inside it that I call into question.

The function is one that I moved early on, it was in the main loop and since it was so large, I decided to move it to a function to make maintenance easier on the routine.

It looks like this

void verticalEdge (short vertical[])
{
	short vY1 = vertical[0];//vY1 goes from -660 to 660
	short aY1 = vertical[1];//aY1 goes from -60 to 60
	short YY1 = vertical[2];//YY1 goes from -30324 to 723959

	vY1 += aY1;
	YY1 += vY1;
	
	if( vY1 + 8 >> 3 )
	{
		aY1 = -vY1 >> 3;
	}
	else
	{
		aY1 = vY1 = 0;
	}

	if( (unsigned int)YY1 > scrh < 0 )
		{
			if( mapy < maph - 1 )
			{
				YY1 = scrh << 12;
				vY1= 0;
				zx_border(3);
			}
		}

		//top
		if( vY1 < 0 )
		{
			if( mapy  scrh)
		{
			YY1 = scrh << 4;
			zx_border(6);
		}
	}
	
	//returns
	vertical[0] = vY1;
	vertical[1] = aY1;
	vertical[2] = YY1;
}

It takes an array, short int of 3 slots, sends it to the short int variables vY1,aY1 and YY1. All 3 of these are called in the function itself. There is a potential memory savings that can happen here.

Once I move the 3 variables to the Global functions header, the game size changed from 12728 bytes free to 12782 bytes free, a gain of 54 bytes. Nice, a smaller footprint.

So, let’s try another experiment, we know that the variables vy, ay and y are already global variables, let’s try replacing the filling of vY1,aY1 and YY1 from the array vertical[].

vY1 = vy;//vY1 goes from -660 to 660	
aY1 = ay;//aY1 goes from -60 to 60
YY1 = y;//YY1 goes from -30324 to 723959

And it works. Interesting.

Now comes our third question about the function, are variables vY1,aY1 and YY1 really needed at all, can we not just use vy, ay and y directly in the function and eliminate the vY1,aY1 and YY1?

Apparent, we can eliminate those variables!

Here’s our function now.

void verticalEdge (short vertical[])
{
        //mapx is char
        //mapy is char
        vy += ay;
        y += vy;
	
        if( vy + 8 >> 3 )
        {
            ay = -vy >> 3;
        }
        else
        {
            ay = vy = 0;
        }

        if( (unsigned int)y > scrh < 0 )
            {
                if( mapy < maph - 1 )
                {
                    y = scrh << 12;
                    vy= 0;
                    zx_border(3);
                }
            }

            //top
            if( vy < 0 )
            {
                if( mapy  scrh)
            {
            y = scrh << 4;
            zx_border(6);
        }
    }	
    //returns
    vertical[0] = vy;
    vertical[1] = ay;
    vertical[2] = y;
}

Not only does it work, but runs slightly faster. Fewer variable swaps which means less memory usage. When we compile, we get a memory report of 12800 bytes free, down from 12782.

Now, let’s concentrate on calling the actual function. Our code looks like this.

//vertical edge
vertical[0] = vy;
vertical[1] = ay;
vertical[2] = y;
verticalEdge(vertical);
vy = vertical[0];
ay = vertical[1];
y = vertical[2];

It’s unwieldy, but works. However, we determined earlier, that we don’t need to push vy, ay and y to an array, so we can eliminate that.

//vertical edge
//vertical[0] = vy;
//vertical[1] = ay;
//vertical[2] = y;
verticalEdge(vertical);
vy = vertical[0];
ay = vertical[1];
y = vertical[2];

Test it, works OK.

If we don’t need to push the array, we probably don’t need to pull it from the array.

//vertical edge
//vertical[0] = vy;
//vertical[1] = ay;
//vertical[2] = y;
verticalEdge(vertical);
//vy = vertical[0];
//ay = vertical[1];
//y = vertical[2];

Test it, works OK.

Then that will mean in the actual function itself, we don’t need to send it to the array at all, so we will modify our function at the end.

//returns
//vertical[0] = vy;
//vertical[1] = ay;
//vertical[2] = y;

Give it a quick test. That is indeed the case.

If we don’t need to push or pull data from that array, we can modify our calling function.

verticalEdge();

and modify the top of the function itself.

void verticalEdge ()

That now means we don’t need that array at all!

Let’s eliminate it from our Global variables

//short vertical [3] = {0,0,0};

and do a test compile, HOLY COW, it works.

Now, our memory footprint is 12919 bytes free.

If you recall at the start, we had 12728 bytes free, we just saved 191 bytes and just increased our speed. As a side effect, our code is a little cleaner and smaller.

Our calling function is now.

verticalEdge();

and the actual function is.

//Check vertical border / edges
void verticalEdge ()
{
    //mapx is char
    //mapy is char

    vy += ay;
    y += vy;
    
    if( vy + 8 >> 3 )
    {
        ay = -vy >> 3;
    }
    else
    {
        ay = vy = 0;
    }

    if( (unsigned int)y > scrh < 0 )
        {
            if( mapy < maph - 1 )
            {
                y = scrh << 12;
                vy= 0;
                zx_border(3);
            }
        }

        //top
        if( vy < 0 )
        {
            if( mapy  scrh)
        {
            y = scrh << 4;
            zx_border(6);
        }
    }
}

Since that works, let’s do the exact same for our horizontalEdge function.

And here’s the calling function now.

horizontalEdge();

And our new routine.

void horizontalEdge ()
{
    //mapx is char
    //mapy is char

    vx += ax;
    x += vx;
    if( vx + 8 >> 3 )
    {
        ax = -vx >> 3;
    }
    else
    {
        ax = vx = 0;
    }

    if( (unsigned int)x > scrw < 0 )
        {
            if( mapx < mapw - 2 )
            {
                x = scrw << 12;
            }
        }
        //screen left
        if( vx < 0 )
        {
            if( mapx  scrw)
        {
            x = scrw << 12;;
            zx_border(6);
        }
    }
}

Now our code has 13097 bytes free, we were running originally at 12728 bytes, so we just gained 369 bytes.

Please make note, the unsigned int, is just a recasting of the short int x, nothing to worry about on that, just sending to a larger variable requires recasting.

Our next routine to examine will be the readScreenTiles function.

void readScreenTiles (unsigned char x1[], unsigned char y1[], unsigned char tileAttribute[])
{
    unsigned char screenX;
    unsigned char screenY;
    unsigned char count;//count how many items are in the array    
    unsigned char tile;
    unsigned char attribute;    
    
    count = 0;
    tile = 0;
    attribute = 0;

    //clear out array from prior level
    //have no more than 89 objects
    //make everything 99, this is a trigger to stop
    //scanning the tiles array to speed up
    //collision detection
    for (tile = 0; tile < 40; tile++)
    {
        x1[tile] = 99;
        y1[tile] = 99;
        tileAttribute[tile] = 99;
    }

    count = 0;
    tile = 0;
    attribute = 0;
    
    
    //screenX 0 to 15
    //screenY 0 to 9
    for ( screenY = 0; screenY < 10; screenY++)
    {
        for (screenX = 0; screenX < 16; screenX++)
        {
            attribute = tiles[screenY * scrw + screenX];

            //if (checking the tile  0)
            //tile 0 is a blank tile to travel
            if ((attribute > 0) && (attribute < 99))
            {
                //mark the x & y position and place it 
                //in the array
                //this is where the problem lies
                //writing to array x1
                // check to make sure less than 99 as well
                
                x1[count] = screenX;
                y1[count] = screenY;                
                tileAttribute[count] = attribute;
                
                count++;//increment the counter for the next slot in array
            }
        }
    }    
}

There are 5 unsigned char variables here that can be moved to global’s that might help.

So, I move and compile and discovered an error, I am actually declaring the variables count and tile twice.

Both should be unsigned char variables, so correct the second casting and recompile the program again. Now, I have 13159 bytes free.

Our routine now looks like.

void readScreenTiles (unsigned char x1[], unsigned char y1[], unsigned char tileAttribute[])
{
    count = 0;
    tile = 0;
    attribute = 0;

    //clear out array from prior level
    //have no more than 89 objects
    //make everything 99, this is a trigger to stop
    //scanning the tiles array to speed up
    //collision detection
    for (tile = 0; tile < 40; tile++)
    {
        x1[tile] = 99;
        y1[tile] = 99;
        tileAttribute[tile] = 99;
    }

    count = 0;
    tile = 0;
    attribute = 0;
    
    
    //screenX 0 to 15
    //screenY 0 to 9
    for ( screenY = 0; screenY < 10; screenY++)
    {
        for (screenX = 0; screenX < 16; screenX++)
        {
            attribute = tiles[screenY * scrw + screenX];

            //if (checking the tile  0)
            //tile 0 is a blank tile to travel
            if ((attribute > 0) && (attribute < 99))
            {
                //mark the x & y position and place it 
                //in the array
                //this is where the problem lies
                //writing to array x1
                // check to make sure less than 99 as well
                
                x1[count] = screenX;
                y1[count] = screenY;                
                tileAttribute[count] = attribute;
                
                count++;//increment the counter for the next slot in array
            }
        }
    }    
}

Optimizing Lesson 3, moving your local variables to a global variable may save you memory.

Not trying to make this a code dump, I’ve got another revelation coming up.

Looking at the square collision detection, we’ve got.

//square detection
void obsticleCollision3 (unsigned char playerX, unsigned char playerY, unsigned char level, unsigned char x1[], unsigned char y1[])
{
    //playerX & playerY localized only
    unsigned char xx1,yy1;
    unsigned char obsticleX, obsticleY;
    unsigned char attribute;
    unsigned char differenceX,differenceY;
    unsigned char iterator;    
    unsigned char objectRadius;    

    objectRadius = 15;
    
    for (iterator = 0; iterator < 40; iterator ++)
    {
        obsticleX = x1[iterator];
        obsticleY = y1[iterator];
        attribute = tileAttribute[iterator];

        //center point of the tiles
        xx1 = (obsticleX * 16) + 8;
        yy1 = (obsticleY * 16) + 8;
        differenceX = abs(playerX - xx1);
        differenceY = abs(playerY - yy1);

        if ((differenceX < objectRadius) && (differenceY < objectRadius))
        {
            zx_border(6);
            break;
        }

        if (attribute == 99) break;// break out of the loop earlier if we encounter 99
    }
}

We’ve got 6 variables that could be moved directly to globals, however, let’s take another look first. We’ve got 6 temporary variables that are of type unsigned char, that are only going to be used internal within that function, in the prior function, we were using 5 variables of unsigned char that were just temporary being used. Then, why couldn’t we just reuse the temporary variables in this and the prior function? Excellent question, glad you asked.

Optimizing Lesson 4, sharing temporary variables will reduce your memory footprint, but make sure there is a sensible name to the variable to promote usage.

If the variable is reasonably named and followable in code, stylistically, there is no reason why you couldn’t share the variable.

We already have an unsigned char attribute from the last routine, so we can get rid of that declaration.

We’ll replace obsticeX and obsticleY with screenX and screenY from the last routine, so we can get rid of those declarations.

I’m going to have a tempx and tempy variable declared in globals, so I’ll replace xx1 and yy1 with those and reuse them as much as possible.

We already have an iterator variable in the globals, so we can get rid of the new declaration.

That just leaves unsigned char differenceX,differenceY and objectRadius, which can be moved to the globals.

Now let’s see what’s working or not.

Sure is, we just make sure to comment the variables.

//square detection
void obsticleCollision3 (unsigned char playerX, unsigned char playerY, unsigned char level, unsigned char x1[], unsigned char y1[])
{
    objectRadius = 15;
    
    for (iterator = 0; iterator < 40; iterator ++)
    {
        screenX = x1[iterator];//screenX stepping through obsticle array
        screenY = y1[iterator];//screenY stepping through obsticle array
        attribute = tileAttribute[iterator];

        //center point of the tiles
        tmpx = (screenX * 16) + 8;
        tmpy = (screenY * 16) + 8;
        differenceX = abs(playerX - tmpx);
        differenceY = abs(playerY - tmpy);

        if ((differenceX < objectRadius) && (differenceY < objectRadius))
        {
            zx_border(6);
            break;
        }

        if (attribute == 99) break;// break out of the loop earlier if we encounter 99
    }
}

Now we have 13212 bytes free.

Some of the beauty of a compiled language is all of our comments come memory free, place as many comments in the code that you want. When you compile, there is not a single byte used up for your comments! Make the code and variables as readable as you want.

Repeat the same process for circleCollisionDetection, we have to move short squaredRadius, squaredDifferenceX, squaredDifferenceY to the globals, otherwise, the function is nearly identical.

Now we have 13288 bytes free.

That’s our new lower memory footprint and that’s where we will stay for now. We just gained 560 bytes, which I am certain that we use up again, but now we should be able to add that one extra routine I have been wanting to add.

Advertisements

Author: andydansby

I'm a hobbyist coder working with the ZX Spectrum. Living in New York state near the Syracuse area. I grew up in Virgina. The first computer my parents bought for me was a Timex Sinclair 2068.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s