Dealing with buffers in libflux

2025-04-19

In C there are a few ways to handle buffers of data. The C API provides two such means.

Terminal/sentinel values: This is most commonly seen in “c-strings”, where the string is terminated with a special NUL (0x00) character. This makes it impossible to use any function that expects such a value with binary data which might contain a NUL byte early or lack one entirely.
Pointer+Size pairs: This is seen in the read()/write()/fread()/fwrite()/memcpy()/etc functions. You provide a pair of arguments: a pointer and a size/length of the pointer. Sometimes you need to provide a third value: the size of the data type (when providing a length of an array).

However, there are a few other possible means:

“Pascal Strings”: So called pascal-strings provide a size type (usually a 16bit integer) inline with the data before the actual data. This can be seen throughout the 9p and other protocols, this allows for passing binary data with less worry (sanity checking possibly malicious/erroneous lengths is necessary). This is also used in certain dynamic array implimentations.
Start+End Pointer pairs: This can be seen in certain parts of various STL implimentations (via the .begin() and .end()), the end pointer can either be a pointer to the final valid value, or one passed the final valid value.
Start+End Index: Similar to the prior example, these are simply indexes into a buffer or array that will be referenced along side the indexes. This can be seen in the jsmn json parser library. Again the end index could either be one passed the final valid value or the final valid value.
Complicated State Objects: Seen in some STL objects, the object or struct would be expected to be passed around instead of direct access to the array and or buffer. This is unavoidable in certain cases, but sometimes done needlessly.

Pros and Cons

Let’s look at what each option has that’s good and bad about them.

Sentinel Values

The sentinel value can be seen throughout the core C standard library (AKA stdlib). The argv and env values passed to main() are NULL-terminated arrays of NUL-terminated strings. Every string literal is NUL-terminated. All functions expected to deal with ASCII or UTF strings expect NUL-terminated strings. Using NUL-terminated strings are the path of least resistance, there is no easy means to use anything else, and there are nearly every kind of function you’d want to handle them provided in <string.h>.

On the otherhand it’s very easy to forget that you need to allocate space for the NUL-terminator, pass data which has an early terminator, or is missing one, and some of the provided functions can produce data that is potentially not NUL-terminated when they normally would (see: entire reason strlcpy exists when strncpy exists and is standard). What about when you’re recieving data that is a mix of “string” data and binary data (eg: network protocols)? You have a buffer of “string” data, that is very likely not NUL-terminated, and thus you either have to copy it out, destroy the source buffer’s integrity (by inserting a NUL, possibly corrupting something else in that packet), or forgo all the normal string functions. Finally, unless you cache this data, to calculate the size of the data in such a buffer or array, a loop must be performed.

Pointer+Size pairing

Also available often through-out the C standard library, and the Berkley socket API (IE: the one everyone uses, because it’s in POSIX). This is less acessible, but still very common throughout stdlib. Some very, very basic operations (memset, bzero, memcpy) are available, but not much in the means of parsing or constructing data meant to be in those buffers. This is the means of transfer for any binary data, or anything that could possibly produce binary data (eg: read()). So long as you keep both parts of the data together, and update the length when passing a further index into the buffer (eg: you’re using repeated memcpy calls to construct data in the buffer), this works very well. While this provides everything needed to properly avoid buffer over-runs, and using indexes (instead of advancing a pointer) you avoid it entirely.

Pascal Strings (inline size)

Very similar to the Pointer and Size pairing, this can trivially be passed to functions requiring those arguments (and trivial macros could be provided to assist). This requires construction of functions that make use of this technique if working with C code. The common means of removing elements of the start of such a buffer or array would require making a copy with the new size inserted before the first element once more. Which means additional copying, which might not otherwise be required. However, this does mean the user of such an API need only provide a single value representing a single buffer, which can make code more readable (not that this is a huge boon in most cases).

Start and End Pointers or Indexes

These are easily convertable between each other, (buffer + index -> pointer and pointer - buffer -> index), it takes care to note wether or not the end value is the final valid value or one beyond it (this affects if end checks are using > or >=). This can also easily be converted into a buffer+size pair (buffer = pointer, size = pointer - buffer) to make use of APIs which use that convention, and when doing so, advancing the pointer means one doesn’t need to continually update the size (as it’s a simple substraction calculation). One could also have a list of start/end pairs all pointing to the same backing buffer safely, and provided that one keeps operations withing those start/end bounds (and there are not overlaps) one could safely mutate data in the buffer for that pairing without corrupting another value within the buffer (eg: the list of start/end pairs are tokenized values from the input buffer). However, there aren’t convenient APIs to use start/end pairs, code has to be littered with conversions to appropriate arguments to functions. Likewise, providing arguments to functions that take such pairs can be very akward (such as when wanting to pass a string literal).

What I do in libflux

Because of dealing a large amount of binary data which contained textual data:

I need functions that can easily handle an ambiguous case (partially binary, fully binary, fully textual with optional NUL terminator)
I need to take a NUL-terminated string and copy it into a buffer without the NUL terminator (and sometimes with)
I need to be able to store textual data and pass it around, and cannot rely on the source having a NUL-terminator. If I need to add a NUL-terminator I can copy the string at that point.
I need to be able to know if a buffer was fully or partially consumed or range (eg: strcmp, atol, or other parsing/extraction methods)
I would like to reduce the amount of needless copies of all or part of a buffer (eg: for purposes of adding a NUL-terminator)
I would like to be able to parse and tokenize a buffer:
- When I can’t guarantee a NUL terminator
- Without making a copy of every token
- Without needing to insert a NUL terminator for each token (there are situations where the tokens a fully adjacent to each other)

Because of this, I feel the best option is to have start and end pointers. I can create sub-ranges of the backing data trivially without needing to modify the data, possibly corrupting other segments. For the purposes of the API I wrote to solve those issues, I use a special struct:

1
2
3

struct FluxBuffer {
	uint8_t *start, *end;
};

This struct uses the start and end pointer method, and I have rewritten versions of most of the normal <string.h> functions to work with these structures. There’s bufcpy to copy data (and because size checks are trivial, it checks the size of the destination is large enough for holding the data in the source), buflcpy (which adds a NUL-terminator to the destination after the copy), bufcmp (which can optionally provided a pointer to where the comparison checks ended), and bufindex (which returns a pointer to the first instance of a character in the buffer).

I also provide a (FLUX_)BUFLIT (Buffer Literal) macro which makes use of the fact that string literals with the same contents can share an address (this breaks if they don’t, but I have a unit-test to sanity check that they do). Due to how it is implimented, you can also provide a static buffer and call BUFLIT on it to get a valid buffer.

This means I can write code like:

uint8_t buffer[1 << 14], *p;
Buffer buf = BUFLIT(buffer);
ssize_t sz;

sz = recv(sock, buffer, sizeof(buffer));

// check sz for -1 error

buf.end = buffer + sz;

if (0 == bufeq(buf, BUFLIT("start"), &p)) {
	// `bufeq` is a more lenient `bufcmp`, and will not fail on a comparison of the final NUL-terminator character in the second argument (arguments are otherwise exactly the same)
	// `p` points to the point in `buf` after the last good match
	//     This means that if we recieved exactly five characters ('s', 't', 'a', 'r', 't') then `p == buf.end` will be true
	//     otherwise, we could use this behavior to parse out more of the recieved data by setting `buf.start` to `p`
}

With more functions provided, and some more helper macros (possibly one to fill both pointer and size arguments of common functions), we could keep code both safe and readable.

In the event one recieves a NUL-terminated string into a buffer:

uint8_t buffer[1 << 14];
Buffer buf = BUFLIT(buffer);

functionThatWritesNULTerminatedString(buffer);

buf.end = bufindex(buf, 0); // Note: this means that end points to the NUL rather than one-passed it.  This is more usefull when wanting to then serialize this string for transmission or storage.

If we need to process multiple sub strings based on some kind of sentinel (eg: reading a buffer line-by-line):

Buffer buf = getBufferToProcess(), line;

line.start = buf.start;
line.end	= bufindex(buf, '\n'); // Returns either buf.end or a pointer to a '\n'

while (line.start < buf.end) {
	processLine(line);

	line.start = line.end + 1; // skip the '\n'
	line.end	= buf.end;
	line.end	= bufindex(line, '\n');
}

… or using the bufadvance macro:

Buffer buf = getBufferToProcess(), line;

line.start = buf.start;
line.end	= buf.start;

while (line.start < buf.end) {
	bufadvance(line.end, buf.end, '\n' != *line.end);

	processLine(line);

	line.end++;
	line.start = line.end;
}

Because the functions to deal with text data and binary data are the same, one doesn’t need to deal with conversions between the requirements of the two function requirements. A substring can easily be produced without needing to create a new copy of the data.

How to get sed to pull out a block

2024-06-02

For a good long while, sed has been a great mystery to me. After-all, it is stylized after the infamous ed editor. Another thing that has eluded me fairly often, is how I can, from a shell prompt, print out a block of code, and only that block of code?

; sed '/cfg_services/,/)/p;d' /etc/rc.conf
cfg_services+=(
	eudev
	'mount' 'sysctl'
	'@lo.iface' '@ntp' '@scron'
	alsa
	@agetty-tty{2..6}
	sshd
	@rc.local runit
	nginx
)

Great. We’re looking for ‘cfg_services’, then we’re looking for the next ‘)’, printing those, then deleting all other output. But what if we had nested blocks of code? Assuming one has sane indentation, one can rely on that:

int main(void) {
	if (foo) {
		printf("Hello, World!\n");
	}

	if (bar) {
		printf("Goodbye, World!\n");
	}
}

Given that code, we want to print the foo check:

; sed '/if .foo/,/\t}/p;d' test.c
	if (foo) {
		printf("Hello, World!\n");
	}

If we want both blocks, we can simplify the first regex:

; sed '/if (/,/\t}/p;d' test.c
	if (foo) {
		printf("Hello, World!\n");
	}
	if (bar) {
		printf("Goodbye, World!\n");
	}

Though this might not be as useful, since we can’t quite operate on each block individually.

Now, what if we wanted to add something to the block with sed? How would we do that? sed has an -i flag that allows it to modify the file it was given, however, we have to operate on the file differently.

; sed -i '/if (foo/,/\t}/ { /}/ i \\t\tfoo();'\n'}' test.c
; cat test.c
int main(void) {
	if (foo) {
		printf("Hello, World!\n");
		foo();
	}

	if (bar) {
		printf("Goodbye, World!\n");
	}
}

A few things to note here. First, if you’re wanting to test your arguments to sed like this, then omit the -i and let it print out the entire file, or pipe that into another sed that is only printing the block you’re wanting to modify. Second thing to note is that -i supresses sed’s behavior to output to stdout, it is printing back into the file it read from instead. Thus to see the results, we have to print the file out again with cat.

But how does it work? We’re using the same pattern matching as before, but instead of having the address affect the p command, we’re having affect a command block in sed, which in this example is: '{ /}/ i \\t\tfoo();'\n'}'. Here, we’re giving a second address to work on, but this address is only within the context of the block we found, we’re looking for the closing brace, and running the i command on it. The i and a commands tell sed to insert and append respectively. If we did not give these commands a second address, they would run on all of the lines found with the first address range. Both commands are terminated with a newline, which my shell natively supports, but if you are using bash you will need to use $'\n' instead of the raw \n. We also have to provide an extra backslash to escape as both a and i will consume one.

Nice and simple. Hope everyone has a good week!

PTY allocation issue: strace to the rescue

2024-03-16

Quick little post today, was having a bit of a frustrating issue where my user account could not spawn certain PTY-allocating programs, but could spawn others. Ultimately I ended up using strace to try and figgure out where it was failing.

; strace -f abduco -c x trash
...
[pid 26646] openat(AT_FDCWD, "/dev/ptmx", O_RDWR <unfinished ...>
[pid 26644] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=26645, si_uid=10000, si_status=0, si_utime=0, si_stime=0} ---
[pid 26646] <... openat resumed>)       = -1 EACCES (Permission denied)
[pid 26644] read(3,  <unfinished ...>
[pid 26646] write(4, "server-forkpty: Permission denie"..., 34) = 34
[pid 26644] <... read resumed>"server-forkpty: Permission denie"..., 255) = 34
[pid 26646] close(4 <unfinished ...>
[pid 26644] read(3,  <unfinished ...>
[pid 26646] <... close resumed>)        = 0
[pid 26644] <... read resumed>"", 221)  = 0
[pid 26646] close(3 <unfinished ...>
[pid 26644] write(2, "server-forkpty: Permission denie"..., 34 <unfinished ...>
server-forkpty: Permission denied
[pid 26646] <... close resumed>)        = 0
[pid 26644] <... write resumed>)        = 34
[pid 26646] close(6 <unfinished ...>
[pid 26644] unlink("/home/R/.abduco/x@workstation4k" <unfinished ...>
[pid 26646] <... close resumed>)        = 0
[pid 26646] exit_group(1)               = ?
[pid 26644] <... unlink resumed>)       = 0
[pid 26644] exit_group(1 <unfinished ...>
[pid 26646] +++ exited with 1 +++
<... exit_group resumed>)               = ?
+++ exited with 1 +++

First thing was that abduco forks, and the child attempts the allocation, so strace‘s -f option to follow children was needed, the [pid 26646] lines are all output of syscalls the child makes. The issue is the attempt to open the /dev/ptmx file, so I check the permissions:

1 2	; ls -l /dev/ptmx crw--w---- 1 root tty 5, 2 Mar 16 18:07 /dev/ptmx

Okay… a user with the tty group can write to it, but root can read/write it? That might make sense? But you can see the program is trying to use the O_RDWR flag, it’s trying to open it read/write. So I check a different system, /dev/ptmx has read-write permissions across the board. A quick sudo chmod a+rw /dev/ptmx fixes the issue.

What’s concerning is why eudevd had those permissions in the first place. A restart might have fixed the issue, but I’d rather not needlessly restart a system.

Drawing with Plan 9 (Pt 0)

2024-03-11

I found a neat little code repository the other day, and decided to play around with it. Now, I’m not totally unfamiliar with plan 9’s drawing routines, having grazed the manuals a few times. However those manual pages run into the problem I have with a number of plan 9’s manual pages, but that’s a post for another day. The manual pages are broadly split into three parts, draw(3), graphics(3), and window(3). Those three man pages contain the bull of the function documentation for the drawing system, and also the definitions for the three main graphical structures: Image, Display, and Screen.

Overall, at first glance this is all sane. Until you look at the function prototypes. The names are not that bad, but if you’re glancing through them looking for some kind of rectangle primitive, you’re not going to see one. Want to draw a rectangle or fill an Image or window? You want the draw() function. Okay, drawing rectangles is the implicit default. That’s sane. Let’s take a closer look: void draw(Image *dst, Rectangle r, Image *src, Image *mask, Point p); and we immediately have a problem. There are three Image pointers as parameters, and no color. This led me down a little rabbit-hole where I was trying to find out how to color an Image. Surely there’s some means to access the pixel data for an Image right? … right? Let’s take a look:

typedef
struct Image
{
     Display   *display; /* display holding data */
     int       id;       /* id of system-held Image */
     Rectangle r;        /* rectangle in data area, local coords */
     Rectangle clipr;    /* clipping region */
     ulong     chan;     /* pixel channel format descriptor */
     int       depth;    /* number of bits per pixel */
     int       repl;     /* flag: data replicates to tile clipr */
     Screen    *screen;  /* 0 if not a window */
     Image     *next;    /* next in list of windows */
} Image;

Okay, so it’s implicitly a part of a linked list, for some reason, and it has pointers to a Screen but only if there’s a window, and a Display which is noted as “display holding data”. Surely, that means that a Display holds the pixel data right?

typedef
struct Display
{
      ...
      void      (*error)(Display*, char*);
      ...
      Image     *black;
      Image     *white;
      Image     *opaque;
      Image     *transparent;
      Image     *image;
      Font      *defaultfont;
      Subfont   *defaultsubfont;
      ...
};

Nope. It contains a bunch of Images. I guess that makes sense as a “Display”, but what is with those? We have white, black, opaque? What? Maybe looking at Screen will enlighten us.

typedef
struct Screen
{
     Display   *display; /* display holding data */
     int       id;       /* id of system-held Screen */
     Image     *image;   /* unused; for reference only */
     Image     *fill;    /* color to paint behind windows */
} Screen;

Okay, that’s reasonable, but not actually what we needed, it would be weird if it was to be honest. Okay, I don’t have any leads, so I may as well start poking around. My objective is to get the gui-menu.c program to not OLED-flashbang me every time it runs. This is actually pretty simple to solve: draw(screen, screen->r, display->black, nil, ZP); Here we’re working with two globals, display which is a Display * and screen which is an Image *. The latter bit tripped me up for a bit when I needed to access the Screen * because I wanted to muck with its fill property. There are two things that I learned here. First, the fill property doesn’t seem to do anything, doesn’t color the background at all in any situation I could try it on. Secondly, the draw() call I made needs to be in the event loop. Not sure why, there might be a paint event I should be listening to, but at the moment I’m just calling it every loop. Thankfully the event loop does block itself when empty.

So! Mission complete right? Yes. But I would really like to be able to draw my own color, and maybe figgure out how to work with the pixel data somehow. Now, it’s not referenced often, but there is a man page for allocimage(3) which has the following prototype: Image *allocimage(Display *d, Rectangle r, ulong chan, int repl, int col). This allocates a new Image that is filled with one single color. Each of the arguments aren’t too difficult to come by either. I wanted something like display->black but my own color so: (display, display->black->r, display->black->chan, 1, 0x004040FF); Nice and simple, duplicate the properties from display->black (though I manually set repl to 1, which is also the value that black and white have from display), compile, and… flashbang. A pure white screen. Now, a first I thought I was incoding the color incorrectly, but that wasn’t the case. The situation is the chan parameter, this indicates the color channel. Natually the black and white images don’t need to store a very complicated color, so that value is 0x31 (not sure what macro that expands to), which is an apparently low color depth. Given I was providing a 32-bit RGBA value, I needed to give RGBA32 as the channel value, and it worked. I have a nice dark non-black window.

Overall, this was an interesting adventure, but definately not one that enamoured me to the man pages. I feel like I’ll want to be taking notes and referencing those instead.

scp considered harmful

2023-12-15

For whatever reason scp does not have a means to copy a symlink as a symlink. This makes scp absolutely terrible if you want to backup or archive a directory structure that contains a symlink cycle. If the directory you are wanting to copy contains a .wine directory, it very likely contains a symlink cycle.

As I have been burned by this in the past, I often argue against the usage of scp altogether, instead suggesting alternate means of transfering files. Either use sftp (via sftp, lftp, or filezilla), or just use tar | tar. I find tar piped into another instance of tar to be quite useful, and quite expressive once you get used to it.

# Copy directories from HOST
ssh HOST tar c dir1 dir2 dir3 | tar x

# Copy directories to HOST
tar c dir1 dir2 dir3 | ssh HOST tar x

# Copy directories to a specific directory on HOST
tar c dir1 dir2 dir3 | ssh HOST tar x -C ~/target/directory/

You can also put pv inbetween the tar transfers to get a progress bar.

The main thing to note with this technique is that tar c (note the lack of -f) produces a tarball to stdout, and tar x consumes a tarball from stdin. On occasion I find it helpful to copy files like this without ssh, generally I want to preserve a specific directory structure in these instances. Or I simply want the benefit of pv to get a sense of the progress of a larger file transfer.

Now, some of you might ask “why not rsync?” Which is a fair question, if you can rely on rsync being present on both sides (as it is requires this), rsync is in itself a very expressive tool. But that’s not something I can rely on. If there’s ssh on a host, there is also very likely to be tar.

XS List Operations (shift, unshift, push, pop)

2023-12-14

All variables in XS are lists, this means that working with any kind of data that can be represented as an array of simple types (IE: ints or strings), can be handled by XS in a very simple manner.

shift

One of the list operations someone might need is to shift an item out of the front of the list, we can use the “multiple assignment” feature of XS to perform this operation:

; list = 1 2 3 4 5 6 7 8
; (a list) = $list
; var a list
a = 1
list = 2 3 4 5 6 7 8

Note: the var command simply prints the name and value of variables.

unshift

Prepending items to an array is commonly known as unshifting, XS will flatten all lists, making this operation trivial:

; list = 0 1 $list
; var list
list = 0 1 2 3 4 5 6 7 8

push

Appending items to an array is exactly as simple as unshift is, and this is often called pushing onto the array.

; list = $list 9 10
; var list
list = 0 1 2 3 4 5 6 7 8 9 10

pop

Removing the last item of an array is no longer trivial, so we have a helper that takes the last item and puts it at the front, letting us shift the final value out.

; (a list) = <={ %shift $list }
; var a list
a = 10
list = 0 1 2 3 4 5 6 7 8 9

The code for the %shift fragment can be seen here:

fn %shift {|l|
    let (n = $#l; m) {
        if {~ $n 0 1} {
            # nothing to do, do nothing
            result $l
        } else if {~ $n 2} {
            result $l(2 1)
        } else {
            m = `($n - 1)
            result $l($n 1 ... $m)
        }
    }
}

Distributed FS Research

2023-12-12

These are just some notes of things I want to look into:

Full-fledged file-systems:

CAS:

Venti
BlobIt (Built-on BookKeeper)

Parts or unknown:

AnyBlob
BookKeeper (Built-on ZooKeeper)
ZooKeeper
Bob
Helia (JS IPFS)
Kubo (Go IPFS)
Storj
BlobFS
eblob
HDFS (required for Hadoop)
Hadoop
AndrewFS

Not interested:

GlusterFS – Needs to much initial setup and configuration, want a JBOD system
CephFS – See GlusterFS
Lizard – Looks like a poorly maintained fork of MooseFS

Functions in XS that need two lists

2023-12-11

One of the more curious things about XS is how nice it is to use when working with a list of things. Bash’s awkward syntax for its arrays is one of the reasons I stopped using it.
However, that is not to say that XS is without fault.

; fn twoLists {|A B|
    echo A \= $A
    echo B \= $B
}
; twoLists (1 2 3) (A B C)
A = 1
B = 2 3 A B C
; X = 1 2 3; Y = A B C
; twoLists $X $Y
A = 1
B = 2 3 A B C

When you give an XS fragment two lists, it passes those as a single list, and XS does not allow nested lists, so the separation between the two is completely removed.
However, XS is at least inspired by functional languages, and you can pass a fragment around, which is effectively a function. You can even have a fragment return a fragment, and closure behavior occurs.

My initial idea on how to solve this was to create a fragment that returns a closure that returns the list. Clojure has such a function, it calls it constantly.

fn %constantly {|list|
    result { result $list }
}

This does mean that the fragment that would need to use such behavior must always use it, so you have behavior like the following:

fn twoLists {|fnA fnB|
    let (A = <=fnA; B = <=fnB) {
        echo A \= $A
        echo B \= $B
    }
}

But at least that behavior is correct:

; twoLists <={ %constantly 1 2 3 } <={ %constantly A B C }
A = 1 2 3
B = A B C

It is however fairly unwieldly to make use of. However, functional languages also offer something called currying, which is something I’ve struggled to see the use of (XS and Javascript being the only languages where I do anything “functional”). So what did I actually need this for? List comparisons. I have two lists, and I want to get a list of what’s common between them. Using %constantly the fragment ends up looking like:

fn %common {|fn-p1 fn-p2|
    let (list = <=p1; res = ()) {
        for i <=p2 {
            if {~ $i $list} {
                res = $res $i
            }
        }
        result $res
    }
}

And calling it is the ugly echo <={ %common <={ %constantly 1 2 3 } <={ %constantly 2 3 4 } }, but if we rewrite it in a way that implements an emulated currying (I am aware that one could argue this isn’t currying):

fn %common {|listA|
    result {|listB|
        let (res = ()) {
            for i $listA {
                if {~ $i $listB} {
                    res = $res $i
                }
            }
            result $res
        }
    }
}

This new version looks a little cleaner in my opinion, and the calling behavior is both cleaner and potentially more useful!

echo <={ <={ %common 1 2 3 } 2 3 4 }

Obviously there’s a limit to how clean it can be with XS, but the fact that %common now returns a closure that is effectively “return what matches with my stored list” means I can save the fragment, then call it against multiple lists if needed.

Shell Rosetta: looping through a list

2023-11-24

bash

$ list=(a 'b c' d e)
$ for x in "${list[@]}"; do echo $x; done
a
b c
d
e

rc

% list=(a 'b c' d e)
% for (x in $list) { echo $x }
a
b c
d
e

xs

; list = (a 'b c' d e)
; for x $list { echo $x }
a
b c
d
e

MusicPD Soft Ramping Alarm Clock

2023-10-27

For a long while, I’ve had a disdain for the abrubt awakening caused by traditional alarm clocks. I wanted something that was still going to wake me, but wouldn’t cause my heart to be racing first thing in the morning. So I wrote some crontab entries to help:

0	5	*	*	*	mpc vol 10
0	5	*	*	*	mpc play
5	5	*	*	*	mpc vol 20
5	5	*	*	*	mpc play
10	5	*	*	*	mpc vol 30
10	5	*	*	*	mpc play
15	5	*	*	*	mpc vol 40
15	5	*	*	*	mpc play
20	5	*	*	*	mpc vol 50
20	5	*	*	*	mpc play
25	5	*	*	*	mpc vol 60
25	5	*	*	*	mpc play
30	5	*	*	*	mpc vol 70
30	5	*	*	*	mpc play
35	5	*	*	*	mpc vol 80
35	5	*	*	*	mpc play
40	5	*	*	*	mpc vol 90
40	5	*	*	*	mpc play

For those not used to reading crontabs, at 0500 set the volume to 10%, then start playing music, and every five minutes increase the volume by 10%, play the music again (as it might have been turned off), and continue to the normal listening volume (90% in this case).

Of course, writing that by hand is a pain, so I have an xs script generate that chunk of my crontab for me:

# User configurable values
max_vol	= 90
min_vol	= 0
hour	= 5
min_step	= 5

#####

# State variables
vol	= $min_vol
min	= 0

while {$vol :lt $max_vol} {
    # Increment volume
    vol	= `($vol + 10)
    cron	= $min $hour \* \* \*

    # %flatten takes a string and a list, it joins the list with the string
    echo <={ %flatten \t $cron 'mpc vol '^$vol }
    echo <={ %flatten \t $cron 'mpc play' }

    # Increment minutes
    min = `($min + $min_step)
}

FOSS Unleashed