juha3141.info | The "Microkernel" Operating System

Today I optimized the memset() and memcpy() function. The reason being that the screen refreshing and newline processing was taking too long.

The current memory functions copy memory by the unit of byte, which means that the memory transaction occurs by one byte at a time. That's really slow. Instead, I am going to implement memset() and memcpy() that copies memory by size of word, the maximum unit of memory transaction that occur for the architecture(the size depends on the architectural limit.)

The code is simple. First, you align the destination. If the number of bytes need to be copied is smaller than the size of the word, we just copy it byte by byte.

void *memset(void *dest , int c , size_t n) {
    unsigned char *dest_ptr_char = (unsigned char *)dest;
    max_t aligned_dest = alignto((max_t)dest , WORD_SIZE);
    
    if(n <= WORD_SIZE) { // no optimization
        for(size_t i = 0; i < n; i++) { *dest_ptr_char++ = (unsigned char)c; }
        return dest;
    }

Since it's much faster to move the contents of the memory that's aligned to size of the word, we copy the small amount of padding before the alignment byte by byte.

    // fill out the front area
    max_t dest_remaining = aligned_dest-(max_t)dest;
    for(size_t i = 0; i < dest_remaining; i++) { 
        *dest_ptr_char++ = (unsigned char)c; 
    }

After which we calculate how many WORD chunks should we have to copy by divide and modular operations.

    max_t size = n-dest_remaining;
    size = size-(size%WORD_SIZE);
    max_t dest_remaining_rear = n-size-dest_remaining;

    max_t *dest_ptr = (max_t *)((max_t)dest+dest_remaining);

Since the given data to fill the memory is in 1-byte size, we make a WORD size of data that repeats 1-byte size of the filler:

    max_t fill_data = 0x00; 
    for(int i = 0; i < WORD_SIZE; i++) {
        fill_data |= (((max_t)c & 0xff) << i*8);
    }

And finally, we copy the data into the memory and the remainder of data on the rear part of the memory chunk.

    for(size_t k = 0; k < size/WORD_SIZE; k++) { 
        dest_ptr[k] = fill_data;
    }

    dest_ptr_char = (unsigned char *)dest+size;
    for(int i = 0; i < dest_remaining_rear; i++) {
        *dest_ptr_char++ = (unsigned char)c;
    }

    return dest;
}

This is actually what Mint64OS implemented their memset() and memcpy() functions. It's not an obscure way to optimize a memory transaction(I believe this is very common.) Since this version of memory functions copy the memory by chunks of WORDS, which is eight times larger than just typical byte(in 64-bit architecture), we can say it's approximately 8x times efficient than the previous one. And it actually is! Once I applied this implementation, the screen refreshing rate has SIGNIFICANTLY gotten faster, like visibly faster. That's good.

One more thing before I close this journal, when you're implementing memmove(), be sure to check the memory location of dest and src, because if those two memory regions intersect(like in the case of moving video memory contents), you either have to reverse-copy the memory in order to not overwrite the contents. Here's the implementation:

void *memmove(void *dest , const void *src , size_t n) {
    if(dest == src) return dest;

    if(dest < src) { return memcpy(dest , src , n); }
    return memcpy_reverse(dest , src , n);
}

Another minor improvements to my beloved kernel!

Project : The "Microkernel" Operating System

Journal Entry Date : 2025.01.26