Planet Lisp

Zach BeaneWant to write Common Lisp for RavenPack? | R. Matthew Emerson

· 6 days ago

Daniel VedderSilicon Valley Syndrome

· 8 days ago

- - -

Apology to Planet Lisp readers: This post is not about Lisp and was erroneously included in the PL feed. We're working to resolve the issue.

- - -

“Silicon Valley Syndrome” is the name I give to a wide-spread myth that is frequently found in affluent, tech-savvy circles. It is the belief that “Every social problem can be solved if you just throw enough technology at it”. This belief lies at the heart of many, many attempts to make the world a better place. Their proponents will say things like: “We can save democracy by combating fake news with algorithms”, or “We can solve Third World hunger using satellite imagery”, or “We can improve education in poor areas by giving every kid an iPad”. These are all laudable attempts, and yet their fundamental assumption is all too often sadly misguided. Why is that?

* * *

The crux of the matter is that social problems are never just technical, but always at least in part human. Societies are made up of humans, and anybody who ignores this human aspect is doomed to (at least partial) failure.

Let us start with a personal example. All students and freelancers know the problem of procrastination. There are thousands of tools devoted to helping poor souls better manage their time (most of them based on some variation of the humble to-do list). But no to-do list, no matter how technologically advanced, has ever forced anybody to do anything. In the end, it is still the individual's choice whether or not to follow his or her plan, or to procrastinate. To-do lists can help keep us organised, but they cannot of themselves change our behaviour. Thus, they do not solve the underlying problem. In the end, procrastination remains a human issue.

We see the same pattern when we turn our attention to the higher level of social problems. One area that seems to be particularly susceptible to the Silicon Valley Syndrome is development aid. Growing up in Africa, I have had a fair bit of exposure to this, so let us take a closer look at some examples from this field.

Sometime in the last two years, friends of mine in Zambia received a visitor from the United States. A well-to-do philanthropist, he wanted to donate a set of laptop computers to a rural school. Ignoring all advice of people on the ground, he looked around for a few days and chose a school. When it was pointed out to him that the school didn't have electricity, he just shrugged and said the laptops could be taken to the next village to charge. That that village also had a school that could have taken the laptops didn't seem relevant. Several months later he returned and was very upset to discover his laptops hadn't even been unpacked yet. The reason? None of the teachers at the school knew how to work with computers…

There were two big problems with this work of charity. First, and more obviously, the utter cluelessness of the philanthropist. Although his motivation was noble, his ignorance of the local situation and his unwillingness to listen to advice meant that a lot of money ended up being wasted. But secondly, note his assumption that a set of computers was the single best investment for a rural school in Africa. Why not invest in text books instead? Or even better, teacher education?

Education is a complex thing, but one thing pretty much all the experts agree on (along with most school children) is the supreme importance of good teachers. Yes, good teachers profit from good tools. But pupils will learn a lot more from a good teacher with bad tools than from a bad teacher with the best tools money can buy. Why is that? Because teaching is a fundamentally human enterprise. Far from being a mere transfer of knowledge, teaching is just as much about relationships and how a teacher interacts with his or her pupils. And that is something no computer can replace.

(Some will argue now that teaching is not necessary for learning to take place; and that given the right resources, students can teach themselves even in the absence of a capable teacher. To this I would reply that while that is indeed true, the percentage of school pupils able and willing to do so is negligible. And even so, they would still learn a lot more effectively if they did have a proper teacher. In short, one shouldn't base the national education strategy on this observation.)

So instead of investing it into technology (the computers), the philanthropist's money might have been better spent training teachers. This isn't easy, it isn't quick, it isn't sexy – but it works. And instead of computers that die from the dust of the African dry season or are dropped or stolen, the output of this investment is a cohort of teachers with careers spanning decades.

One project that has taken this route is FCE Masaiti, a Christian training college in Zambia. Their courses include diplomas in community development, agriculture, and teaching. Their whole approach is based around the tenet of “serving the community”. Their students live in a rural setting and are taught the skills they need for their probable future situation right from the start. When they leave, they will be government certified teachers ready to teach in any primary school in the country; and they will have been given the vision of serving their community. (Having worked alongside one of their graduates, and knowing several current students, I can attest to the quality of their training.) FCE is investing into people to help solve social problems, and their influence will be felt across the country for years to come.

* * *

“Give me a place to stand, and with a lever I will move the whole world”, said Archimedes. Having recognised the power of even a simple piece of technology to change the world, his remark is valid to this day. Technology, from the first papyrus to the printing press and the smart phone, has changed and will continue to change the way we communicate, travel, and live. What it has never once changed is human nature. It is a tremendous tool that can be leveraged for spectacular good (and just as spectacular evil). But it can never, ever, be a self-sufficient solution to a social problem.

The lever that was to move the world still needed Archimedes to operate it. As long as humans are involved – and in a social problem, they are so by definition – a technical solution will only ever be half the story. Perhaps this is most evident when working in an intercultural setting, such as development aid. Developing viable, sustainable solutions in this context requires an intimate understanding of culture: how individuals think and how communities function. Any “solutions” that pay no attention to these atechnical, human factors will never get off the ground. But it doesn't matter whether we are looking at aid work, crime reduction, or the defence of democracy; the bottom line remains the same. Societies are composed of humans, and so any workable solution to social problems – though it may also include technology – must consider the human perspective.

Paul KhuongPreemption Is GC for Memory Reordering

· 12 days ago

I previously noted how preemption makes lock-free programming harder in userspace than in the kernel. I now believe that preemption ought to be treated as a sunk cost, like garbage collection: we’re already paying for it, so we might as well use it. Interrupt processing (returning from an interrupt handler, actually) is fully serialising on x86, and on other platforms, no doubt: any userspace instruction either fully executes before the interrupt, or is (re-)executed from scratch some time after the return back to userspace. That’s something we can abuse to guarantee ordering between memory accesses, without explicit barriers.

This abuse of interrupts is complementary to Bounded TSO. Bounded TSO measures the hardware limit on the number of store instructions that may concurrently be in-flight (and combines that with the knowledge that instructions are retired in order) to guarantee liveness without explicit barriers, with no overhead, and usually marginal latency. However, without worst-case execution time information, it’s hard to map instruction counts to real time. Tracking interrupts lets us determine when enough real time has elapsed that earlier writes have definitely retired, albeit after a more conservative delay than Bounded TSO’s typical case.

I reached this position after working on two lock-free synchronisation primitives—event counts, and asymmetric flag flips as used in hazard pointers and epoch reclamation—that are similar in that a slow path waits for a sign of life from a fast path, but differ in the way they handle “stuck” fast paths. I’ll cover the event count and flag flip implementations that I came to on Linux/x86[-64], which both rely on interrupts for ordering. Hopefully that will convince you too that preemption is a useful source of pre-paid barriers for lock-free code in userspace.

I’m writing this for readers who are already familiar with lock-free programming, safe memory reclamation techniques in particular, and have some experience reasoning with formal memory models. For more references, Samy’s overview in the ACM Queue is a good resource. I already committed the code for event counts in Concurrency Kit, and for interrupt-based reverse barriers in my barrierd project.

Event counts with x86-TSO and futexes

An event count is essentially a version counter that lets threads wait until the current version differs from an arbitrary prior version. A trivial “wait” implementation could spin on the version counter. However, the value of event counts is that they let lock-free code integrate with OS-level blocking: waiters can grab the event count’s current version v0, do what they want with the versioned data, and wait for new data by sleeping rather than burning cycles until the event count’s version differs from v0. The event count is a common synchronisation primitive that is often reinvented and goes by many names (e.g., blockpoints); what matters is that writers can update the version counter, and waiters can read the version, run arbitrary code, then efficiently wait while the version counter is still equal to that previous version.

The explicit version counter solves the lost wake-up issue associated with misused condition variables, as in the pseudocode below.

bad condition waiter:

while True:
    atomically read data
    if need to wait:
        WaitOnConditionVariable(cv)
    else:
        break

In order to work correctly, condition variables require waiters to acquire a mutex that protects both data and the condition variable, before checking that the wait condition still holds and then waiting on the condition variable.

good condition waiter:

while True:
    with(mutex):
        read data
        if need to wait:
            WaitOnConditionVariable(cv, mutex)
        else:
            break

Waiters must prevent writers from making changes to the data, otherwise the data change (and associated condition variable wake-up) could occur between checking the wait condition, and starting to wait on the condition variable. The waiter would then have missed a wake-up and could end up sleeping forever, waiting for something that has already happened.

good condition waker:

with(mutex):
    update data
    SignalConditionVariable(cv)

The six diagrams below show the possible interleavings between the signaler (writer) making changes to the data and waking waiters, and a waiter observing the data and entering the queue to wait for changes. The two left-most diagrams don’t interleave anything; these are the only scenarios allowed by correct locking. The remaining four actually interleave the waiter and signaler, and show that, while three are accidentally correct (lucky), there is one case, WSSW, where the waiter misses its wake-up.

If any waiter can prevent writers from making progress, we don’t have a lock-free protocol. Event counts let waiters detect when they would have been woken up (the event count’s version counter has changed), and thus patch up this window where waiters can miss wake-ups for data changes they have yet to observe. Crucially, waiters detect lost wake-ups, rather than preventing them by locking writers out. Event counts thus preserve lock-freedom (and even wait-freedom!).

We could, for example, use an event count in a lock-free ring buffer: rather than making consumers spin on the write pointer, the write pointer could be encoded in an event count, and consumers would then efficiently block on that, without burning CPU cycles to wait for new messages.

The challenging part about implementing event counts isn’t making sure to wake up sleepers, but to only do so when there are sleepers to wake. For some use cases, we don’t need to do any active wake-up, because exponential backoff is good enough: if version updates signal the arrival of a response in a request/response communication pattern, exponential backoff, e.g., with a 1.1x backoff factor, could bound the increase in response latency caused by the blind sleep during backoff, e.g., to 10%.

Unfortunately, that’s not always applicable. In general, we can’t assume that signals corresponds to responses for prior requests, and we must support the case where progress is usually fast enough that waiters only spin for a short while before grabbing more work. The latter expectation means we can’t “just” unconditionally execute a syscall to wake up sleepers whenever we increment the version counter: that would be too slow. This problem isn’t new, and has a solution similar to the one deployed in adaptive spin locks.

The solution pattern for adaptive locks relies on tight integration with an OS primitive, e.g., futexes. The control word, the machine word on which waiters spin, encodes its usual data (in our case, a version counter), as well as a new flag to denote that there are sleepers waiting to be woken up with an OS syscall. Every write to the control word uses atomic read-modify-write instructions, and before sleeping, waiters ensure the “sleepers are present” flag is set, then make a syscall to sleep only if the control word is still what they expect, with the sleepers flag set.

OpenBSD’s compatibility shim for Linux’s futexes is about as simple an implementation of the futex calls as it gets. The OS code for futex wake and wait is identical to what userspace would do with mutexes and condition variables (waitqueues). Waiters lock out wakers for the futex word or a coarser superset, check that the futex word’s value is as expected, and enters the futex’s waitqueue. Wakers acquire the futex word for writes, and wake up the waitqueue. The difference is that all of this happens in the kernel, which, unlike userspace, can force the scheduler to be helpful. Futex code can run in the kernel because, unlike arbitrary mutex/condition variable pairs, the protected data is always a single machine integer, and the wait condition an equality test. This setup is simple enough to fully implement in the kernel, yet general enough to be useful.

OS-assisted conditional blocking is straightforward enough to adapt to event counts. The control word is the event count’s version counter, with one bit stolen for the “sleepers are present” flag (sleepers flag).

Incrementing the version counter can use a regular atomic increment; we only need to make sure we can tell whether the sleepers flag might have been set before the increment. If the sleepers flag was set, we clear it (with an atomic bit reset), and wake up any OS thread blocked on the control word.

increment event count:

old <- fetch_and_add(event_count.counter, 2)  # flag is in the low bit
if (old & 1):
    atomic_and(event_count.counter, -2)
    signal waiters on event_count.counter

Waiters can spin for a while, waiting for the version counter to change. At some point, a waiter determines that it’s time to stop wasting CPU time. The waiter then sets the sleepers flag with a compare-and-swap: the CAS (compare-and-swap) can only fail because the counter’s value has changed or because the flag is already set. In the former failure case, it’s finally time to stop waiting. In the latter failure care, or if the CAS succeeded, the flag is now set. The waiter can then make a syscall to block on the control word, but only if the control word still has the sleepers flag set and contains the same expected (old) version counter.

wait until event count differs from prev:

repeat k times:
    if (event_count.counter / 2) != prev:  # flag is in low bit.
        return
compare_and_swap(event_count.counter, prev * 2, prev * 2 + 1)
if cas_failed and cas_old_value != (prev * 2 + 1):
    return
repeat k times:
    if (event_count.counter / 2) != prev:
        return
sleep_if(event_count.center == prev * 2 + 1)

This scheme works, and offers decent performance. In fact, it’s good enough for Facebook’s Folly.
I certainly don’t see how we can improve on that if there are concurrent writers (incrementing threads).

However, if we go back to the ring buffer example, there is often only one writer per ring. Enqueueing an item in a single-producer ring buffer incurs no atomic, only a release store: the write pointer increment only has to be visible after the data write, which is always the case under the TSO memory model (including x86). Replacing the write pointer in a single-producer ring buffer with an event count where each increment incurs an atomic operation is far from a no-brainer. Can we do better, when there is only one incrementer?

On x86 (or any of the zero other architectures with non-atomic read-modify-write instructions and TSO), we can... but we must accept some weirdness.

The operation that must really be fast is incrementing the event counter, especially when the sleepers flag is not set. Setting the sleepers flag on the other hand, may be slower and use atomic instructions, since it only happens when the executing thread is waiting for fresh data.

I suggest that we perform the former, the increment on the fast path, with a non-atomic read-modify-write instruction, either inc mem or xadd mem, reg. If the sleepers flag is in the sign bit, we can detect it (modulo a false positive on wrap-around) in the condition codes computed by inc; otherwise, we must use xadd (fetch-and-add) and look at the flag bit in the fetched value.

The usual ordering-based arguments are no help in this kind of asymmetric synchronisation pattern. Instead, we must go directly to the x86-TSO memory model. All atomic (LOCK prefixed) instructions conceptually flush the executing core’s store buffer, grab an exclusive lock on memory, and perform the read-modify-write operation with that lock held. Thus, manipulating the sleepers flag can’t lose updates that are already visible in memory, or on their way from the store buffer. The RMW increment will also always see the latest version update (either in global memory, or in the only incrementer’s store buffer), so won’t lose version updates either. Finally, scheduling and thread migration must always guarantee that the incrementer thread sees its own writes, so that won’t lose version updates.

increment event count without atomics in the common case:

old <- non_atomic_fetch_and_add(event_count.counter, 2)
if (old & 1):
    atomic_and(event_count.counter, -2)
    signal waiters on event_count.counter

The only thing that might be silently overwritten is the sleepers flag: a waiter might set that flag in memory just after the increment’s load from memory, or while the increment reads a value with the flag unset from the local store buffer. The question is then how long waiters must spin before either observing an increment, or knowing that the flag flip will be observed by the next increment. That question can’t be answered with the memory model, and worst-case execution time bounds are a joke on contemporary x86.

I found an answer by remembering that IRET, the instruction used to return from interrupt handlers, is a full barrier.1 We also know that interrupts happen at frequent and regular intervals, if only for the preemption timer (every 4-10ms on stock Linux/x86oid).

Regardless of the bound on store visibility, a waiter can flip the sleepers-are-present flag, spin on the control word for a while, and then start sleeping for short amounts of time (e.g., a millisecond or two at first, then 10 ms, etc.): the spin time is long enough in the vast majority of cases, but could still, very rarely, be too short.

At some point, we’d like to know for sure that, since we have yet to observe a silent overwrite of the sleepers flag or any activity on the counter, the flag will always be observed and it is now safe to sleep forever. Again, I don’t think x86 offers any strict bound on this sort of thing. However, one second seems reasonable. Even if a core could stall for that long, interrupts fire on every core several times a second, and returning from interrupt handlers acts as a full barrier. No write can remain in the store buffer across interrupts, interrupts that occur at least once per second. It seems safe to assume that, once no activity has been observed on the event count for one second, the sleepers flag will be visible to the next increment.

That assumption is only safe if interrupts do fire at regular intervals. Some latency sensitive systems dedicate cores to specific userspace threads, and move all interrupt processing and preemption away from those cores. A correctly isolated core running Linux in tickless mode, with a single runnable process, might not process interrupts frequently enough. However, this kind of configuration does not happen by accident. I expect that even a half-second stall in such a system would be treated as a system error, and hopefully trigger a watchdog. When we can’t count on interrupts to get us barriers for free, we can instead rely on practical performance requirements to enforce a hard bound on execution time.

Either way, waiters set the sleepers flag, but can’t rely on it being observed until, very conservatively, one second later. Until that time has passed, waiters spin on the control word, then block for short, but growing, amounts of time. Finally, if the control word (event count version and sleepers flag) has not changed in one second, we assume the incrementer has no write in flight, and will observe the sleepers flag; it is safe to block on the control word forever.

wait until event count differs from prev:

repeat k times:
    if (event_count.counter / 2) != prev:
        return
compare_and_swap(event_count.counter, 2 * prev, 2 * prev + 1)
if cas_failed and cas_old_value != 2 * prev + 1:
    return
repeat k times:
    if event_count.counter != 2 * prev + 1:
        return
repeat for 1 second:
    sleep_if_until(event_count.center == 2 * prev + 1,
                   $exponential_backoff)
    if event_count.counter != 2 * prev + 1:
        return
sleep_if(event_count.center == prev * 2 + 1)

That’s the solution I implemented in this pull request for SPMC and MPMC event counts in concurrency kit. The MP (multiple producer) implementation is the regular adaptive logic, and matches Folly’s strategy. It needs about 30 cycles for an uncontended increment with no waiter, and waking up sleepers adds another 700 cycles on my E5-46xx (Linux 4.16). The single producer implementation is identical for the slow path, but only takes ~8 cycles per increment with no waiter, and, eschewing atomic instruction, does not flush the pipeline (i.e., the out-of-order execution engine is free to maximise throughput). The additional overhead for an increment without waiter, compared to a regular ring buffer pointer update, is 3-4 cycles for a single predictable conditional branch or fused test and branch, and the RMW’s load instead of a regular add/store. That’s closer to zero overhead, which makes it much easier for coders to offer OS-assisted blocking in their lock-free algorithms, without agonising over the penalty when no one needs to block.

Asymmetric flag flip with interrupts on Linux

Hazard pointers and epoch reclamation. Two different memory reclamation technique, in which the fundamental complexity stems from nearly identical synchronisation requirements: rarely, a cold code path (which is allowed to be very slow) writes to memory, and must know when another, much hotter, code path is guaranteed to observe the slow path’s last write.

For hazard pointers, the cold code path waits until, having overwritten an object’s last persistent reference in memory, it is safe to destroy the pointee. The hot path is the reader:

1. read pointer value *(T **)x.
2. write pointer value to hazard pointer table
3. check that pointer value *(T **)x has not changed

Similarly, for epoch reclamation, a read-side section will grab the current epoch value, mark itself as reading in that epoch, then confirm that the epoch hasn’t become stale.

1. $epoch <- current epoch
2. publish self as entering a read-side section under $epoch
3. check that $epoch is still current, otherwise retry

Under a sequentially consistent (SC) memory model, the two sequences are valid with regular (atomic) loads and stores. The slow path can always make its write, then scan every other thread’s single-writer data to see if any thread has published something that proves it executed step 2 before the slow path’s store (i.e., by publishing the old pointer or epoch value).

The diagrams below show all possible interleavings. In all cases, once there is no evidence that a thread has failed to observe the slow path’s new write, we can correctly assume that all threads will observe the write. I simplified the diagrams by not interleaving the first read in step 1: its role is to provide a guess for the value that will be re-read in step 3, so, at least with respect to correctness, that initial read might as well be generating random values. I also kept the second “scan” step in the slow path abstract. In practice, it’s a non-snapshot read of all the epoch or hazard pointer tables for threads that execute the fast path: the slow path can assume an epoch or pointer will not be resurrected once the epoch or pointer is absent from the scan.

No one implements SC in hardware. X86 and SPARC offer the strongest practical memory model, Total Store Ordering, and that’s still not enough to correctly execute the read-side critical sections above without special annotations. Under TSO, reads (e.g., step 3) are allowed to execute before writes (e.g., step 2). X86-TSO models that as a buffer in which stores may be delayed, and that’s what the scenarios below show, with steps 2 and 3 of the fast path reversed (the slow path can always be instrumented to recover sequential order, it’s meant to be slow). The TSO interleavings only differ from the SC ones when the fast path’s steps 2 and 3 are separated by something on slow path’s: when the two steps are adjacent, their order relative to the slow path’s steps is unaffected by TSO’s delayed stores. TSO is so strong that we only have to fix one case, FSSF, where the slow path executes in the middle of the fast path, with the reversal of store and load order allowed by TSO.

Simple implementations plug this hole with a store-load barrier between the second and third steps, or implement the store with an atomic read-modify-write instruction that doubles as a barrier. Both modifications are safe and recover SC semantics, but incur a non-negligible overhead (the barrier forces the out of order execution engine to flush before accepting more work) which is only necessary a minority of the time.

The pattern here is similar to the event count, where the slow path signals the fast path that the latter should do something different. However, where the slow path for event counts wants to wait forever if the fast path never makes progress, hazard pointer and epoch reclamation must detect that case and ignore sleeping threads (that are not in the middle of a read-side SMR critical section).

In this kind of asymmetric synchronisation pattern, we wish to move as much of the overhead to the slow (cold) path. Linux 4.3 gained the membarrier syscall for exactly this use case. The slow path can execute its write(s) before making a membarrier syscall. Once the syscall returns, any fast path write that has yet to be visible (hasn’t retired yet), along with every subsequent instruction in program order, started in a state where the slow path’s writes were visible. As the next diagram shows, this global barrier lets us rule out the one anomalous execution possible under TSO, without adding any special barrier to the fast path.

The problem with membarrier is that it comes in two flavours: slow, or not scalable. The initial, unexpedited, version waits for kernel RCU to run its callback, which, on my machine, takes anywhere between 25 and 50 milliseconds. The reason it’s so slow is that the condition for an RCU grace period to elapse are more demanding than a global barrier, and may even require multiple such barriers. For example, if we used the same scheme to nest epoch reclamation ten deep, the outermost reclaimer would be 1024 times slower than the innermost one. In reaction to this slowness, potential users of membarrier went back to triggering IPIs, e.g., by mprotecting a dummy page. mprotect isn’t guaranteed to act as a barrier, and does not do so on AArch64, so Linux 4.16 added an “expedited” mode to membarrier. In that expedited mode, each membarrier syscall sends an IPI to every other core... when I look at machines with hundreds of cores, \(n - 1\) IPI per core, a couple times per second on every \(n\) core, start to sound like a bad idea.

Let’s go back to the observation we made for event count: any interrupt acts as a barrier for us, in that any instruction that retires after the interrupt must observe writes made before the interrupt. Once the hazard pointer slow path has overwritten a pointer, or the epoch slow path advanced the current epoch, we can simply look at the current time, and wait until an interrupt has been handled at a later time on all cores. The slow path can then scan all the fast path state for evidence that they are still using the overwritten pointer or the previous epoch: any fast path that has not published that fact before the interrupt will eventually execute the second and third steps after the interrupt, and that last step will notice the slow path’s update.

There’s a lot of information in /proc that lets us conservatively determine when a new interrupt has been handled on every core. However, it’s either too granular (/proc/stat) or extremely slow to generate (/proc/schedstat). More importantly, even with ftrace, we can’t easily ask to be woken up when something interesting happens, and are forced to poll files for updates (never mind the weirdly hard to productionalise kernel interface).

What we need is a way to read, for each core, the last time it was definitely processing an interrupt. Ideally, we could also block and let the OS wake up our waiter on changes to the oldest “last interrupt” timestamp, across all cores. On x86, that’s enough to get us the asymmetric barriers we need for hazard pointers and epoch reclamation, even if only IRET is serialising, and not interrupt handler entry. Once a core’s update to its “last interrupt” timestamp is visible, any write prior to the update, and thus any write prior to the interrupt is also globally visible: we can only observe the timestamp update from a different core than the updater, in which case TSO saves us, or after the handler has returned with a serialising IRET.

We can bundle all that logic in a short eBPF program.2 The program has a map of thread-local arrays (of 1 CLOCK_MONOTONIC timestamp each), a map of perf event queues (one per CPU), and an array of 1 “watermark” timestamp. Whenever the program runs, it gets the current time. That time will go in the thread-local array of interrupt timestamps. Before storing a new value in that array, the program first reads the previous interrupt time: if that time is less than or equal to the watermark, we should wake up userspace by enqueueing in event in perf. The enqueueing is conditional because perf has more overhead than a thread-local array, and because we want to minimise spurious wake-ups. A high signal-to-noise ratio lets userspace set up the read end of the perf queue to wake up on every event and thus minimise update latency.

We now need a single global daemon to attach the eBPF program to an arbitrary set of software tracepoints triggered by interrupts (or PMU events that trigger interrupts), to hook the perf fds to epoll, and to re-read the map of interrupt timestamps whenever epoll detects a new perf event. That’s what the rest of the code handles: setting up tracepoints, attaching the eBPF program, convincing perf to wake us up, and hooking it all up to epoll. On my fully loaded 24-core E5-46xx running Linux 4.18 with security patches, the daemon uses ~1-2% (much less on 4.16) of a core to read the map of timestamps every time it’s woken up every ~4 milliseconds. perf shows the non-JITted eBPF program itself uses ~0.1-0.2% of every core.

Amusingly enough, while eBPF offers maps that are safe for concurrent access in eBPF programs, the same maps come with no guarantee when accessed from userspace, via the syscall interface. However, the implementation uses a hand-rolled long-by-long copy loop, and, on x86-64, our data all fit in longs. I’ll hope that the kernel’s compilation flags (e.g., -ffree-standing) suffice to prevent GCC from recognising memcpy or memmove, and that we thus get atomic store and loads on x86-64. Given the quality of eBPF documentation, I’ll bet that this implementation accident is actually part of the API. Every BPF map is single writer (either per-CPU in the kernel, or single-threaded in userspace), so this should work.

Once the barrierd daemon is running, any program can mmap its data file to find out the last time we definitely know each core had interrupted userspace, without making any further syscall or incurring any IPI. We can also use regular synchronisation to let the daemon wake up threads waiting for interrupts as soon as the oldest interrupt timestamp is updated. Applications don’t even need to call clock_gettime to get the current time: the daemon also works in terms of a virtual time that it updates in the mmaped data file.

The barrierd data file also includes an array of per-CPU structs with each core’s timestamps (both from CLOCK_MONOTONIC and in virtual time). A client that knows it will only execute on a subset of CPUs, e.g., cores 2-6, can compute its own “last interrupt” timestamp by only looking at entries 2 to 6 in the array. The daemon even wakes up any futex waiter on the per-CPU values whenever they change. The convenience interface is pessimistic, and assumes that client code might run on every configured core. However, anyone can mmap the same file and implement tighter logic.

Again, there’s a snag with tickless kernels. In the default configuration already, a fully idle core might not process timer interrupts. The barrierd daemon detects when a core is falling behind, and starts looking for changes to /proc/stat. This backup path is slower and coarser grained, but always works with idle cores. More generally, the daemon might be running on a system with dedicated cores. I thought about causing interrupts by re-affining RT threads, but that seems counterproductive. Instead, I think the right approach is for users of barrierd to treat dedicated cores specially. Dedicated threads can’t (shouldn’t) be interrupted, so they can regularly increment a watchdog counter with a serialising instruction. Waiters will quickly observe a change in the counters for dedicated threads, and may use barrierd to wait for barriers on preemptively shared cores. Maybe dedicated threads should be able to hook into barrierd and check-in from time to time. That would break the isolation between users of barrierd, but threads on dedicated cores are already in a privileged position.

I quickly compared the barrier latency on an unloaded 4-way E5-46xx running Linux 4.16, with a sample size of 20000 observations per method (I had to remove one outlier at 300ms). The synchronous methods mprotect (which abuses mprotect to send IPIs by removing and restoring permissions on a dummy page), or explicit IPI via expedited membarrier, are much faster than the other (unexpedited membarrier with kernel RCU, or barrierd that counts interrupts). We can zoom in on the IPI-based methods, and see that an expedited membarrier (IPI) is usually slightly faster than mprotect; IPI via expedited membarrier hits a worst-case of 0.041 ms, versus 0.046 for mprotect.

The performance of IPI-based barriers should be roughly independent of system load. However, we did observe a slowdown for expedited membarrier (between \(68.4-73.0\%\) of the time, \(p < 10\sp{-12}\) according to a binomial test3) on the same 4-way system, when all CPUs were running CPU-intensive code at low priority. In this second experiment, we have a sample size of one million observations for each method, and the worst case for IPI via expedited membarrier was 0.076 ms (0.041 ms on an unloaded system), compared to a more stable 0.047 ms for mprotect.

Now for non-IPI methods: they should be slower than methods that trigger synchronous IPIs, but hopefully have lower overhead and scale better, while offering usable latencies.

On an unloaded system, the interrupts that drive barrierd are less frequent, sometimes outright absent, so unexpedited membarrier achieves faster response times. We can even observe barrierd’s fallback logic, which scans /proc/stat for evidence of idle CPUs after 10 ms of inaction: that’s the spike at 20ms. The values for vtime show the additional slowdown we can expect if we wait on barrierd’s virtual time, rather than directly reading CLOCK_MONOTONIC. Overall, the worst case latencies for barrierd (53.7 ms) and membarrier (39.9 ms) aren’t that different, but I should add another fallback mechanism based on membarrier to improve barrierd’s performance on lightly loaded machines.

When the same 4-way, 24-core, system is under load, interrupts are fired much more frequently and reliably, so barrierd shines, but everything has a longer tail, simply because of preemption of the benchmark process. Out of the one million observations we have for each of unexpedited membarrier, barrierd, and barrierd with virtual time on this loaded system, I eliminated 54 values over 100 ms (18 for membarrier, 29 for barrierd, and 7 for virtual time). The rest is shown below. barrierd is consistently much faster than membarrier, with a geometric mean speedup of 23.8x. In fact, not only can we expect barrierd to finish before an unexpedited membarrier \(99.99\%\) of the time (\(p<10\sp{-12}\) according to a binomial test), but we can even expect barrierd to be 10 times as fast \(98.3-98.5\%\) of the time (\(p<10\sp{-12}\)). The gap is so wide that even the opportunistic virtual-time approach is faster than membarrier (geometric mean of 5.6x), but this time with a mere three 9s (as fast as membarrier \(99.91-99.96\%\) of the time, \(p<10\sp{-12}\)).

With barrierd, we get implicit barriers with worse overhead than unexpedited membarrier (which is essentially free since it piggybacks on kernel RCU, another sunk cost), but 1/10th the latency (0-4 ms instead of 25-50 ms). In addition, interrupt tracking is per-CPU, not per-thread, so it only has to happen in a global single-threaded daemon; the rest of userspace can obtain the information it needs without causing additional system overhead. More importantly, threads don’t have to block if they use barrierd to wait for a system-wide barrier. That’s useful when, e.g., a thread pool worker is waiting for a reverse barrier before sleeping on a futex. When that worker blocks in membarrier for 25ms or 50ms, there’s a potential hiccup where a work unit could sit in the worker’s queue for that amount of time before it gets processed. With barrierd (or the event count described earlier), the worker can spin and wait for work units to show up until enough time has passed to sleep on the futex.

While I believe that information about interrupt times should be made available without tracepoint hacks, I don’t know if a syscall like membarrier is really preferable to a shared daemon like barrierd. The one salient downside is that barrierd slows down when some CPUs are idle; that’s something we can fix by including a membarrier fallback, or by sacrificing power consumption and forcing kernel ticks, even for idle cores.

Preemption can be an asset

When we write lock-free code in userspace, we always have preemption in mind. In fact, the primary reason for lock-free code in userspace is to ensure consistent latency despite potentially adversarial scheduling. We spend so much effort to make our algorithms work despite interrupts and scheduling that we can fail to see how interrupts can help us. Obviously, there’s a cost to making our code preemption-safe, but preemption isn’t an option. Much like garbage collection in managed language, preemption is a feature we can’t turn off. Unlike GC, it’s not obvious how to make use of preemption in lock-free code, but this post shows it’s not impossible.

We can use preemption to get asymmetric barriers, nearly for free, with a daemon like barrierd. I see a duality between preemption-driven barriers and techniques like Bounded TSO: the former are relatively slow, but offer hard bounds, while the latter guarantee liveness, usually with negligible latency, but without any time bound.

I used preemption to make single-writer event counts faster (comparable to a regular non-atomic counter), and to provide a lower-latency alternative to membarrier’s asymmetric barrier. In a similar vein, SPeCK uses time bounds to ensure scalability, at the expense of a bit of latency, by enforcing periodic TLB reloads instead of relying on synchronous shootdowns. What else can we do with interrupts, timer or otherwise?

Thank you Samy, Gabe, and Hanes for discussions on an earlier draft. Thank you Ruchir for improving this final version.

P.S. event count without non-atomic RMW?

The single-producer event count specialisation relies on non-atomic read-modify-write instructions, which are hard to find outside x86. I think the flag flip pattern in epoch and hazard pointer reclamation shows that’s not the only option.

We need two control words, one for the version counter, and another for the sleepers flag. The version counter is only written by the incrementer, with regular non-atomic instructions, while the flag word is written to by multiple producers, always with atomic instructions.

The challenge is that OS blocking primitives like futex only let us conditionalise the sleep on a single word. We could try to pack a pair of 16-bit shorts in a 32-bit int, but that doesn’t give us a lot of room to avoid wrap-around. Otherwise, we can guarantee that the sleepers flag is only cleared immediately before incrementing the version counter. That suffices to let sleepers only conditionalise on the version counter... but we still need to trigger a wake-up if the sleepers flag was flipped between the last clearing and the increment.

On the increment side, the logic looks like

must_wake = false
if sleepers flag is set:
    must_wake = true
    clear sleepers flag
increment version
if must_wake or sleepers flag is set:
    wake up waiters

and, on the waiter side, we find

if version has changed
    return
set sleepers flag
sleep if version has not changed

The separate “sleepers flag” word doubles the space usage, compared to the single flag bit in the x86 single-producer version. Composite OS uses that two-word solution in blockpoints, and the advantages seem to be simplicity and additional flexibility in data layout. I don’t know that we can implement this scheme more efficiently in the single producer case, under other memory models than TSO. If this two-word solution is only useful for non-x86 TSO, that’s essentially SPARC, and I’m not sure that platform still warrants the maintenance burden.

But, we’ll see, maybe we can make the above work on AArch64 or POWER.


  1. I actually prefer another, more intuitive, explanation that isn’t backed by official documentation.The store buffer in x86-TSO doesn’t actually exist in silicon: it represents the instructions waiting to be retired in the out-of-order execution engine. Precise interrupts seem to imply that even entering the interrupt handler flushes the OOE engine’s state, and thus acts as a full barrier that flushes the conceptual store buffer.

  2. I used raw eBPF instead of the C frontend because that frontend relies on a ridiculous amount of runtime code that parses an ELF file when loading the eBPF snippet to know what eBPF maps to setup and where to backpatch their fd number. I also find there’s little advantage to the C frontend for the scale of eBPF programs (at most 4096 instructions, usually much fewer). I did use clang to generate a starting point, but it’s not that hard to tighten 30 instructions in ways that a compiler can’t without knowing what part of the program’s semantics is essential. The bpf syscall can also populate a string buffer with additional information when loading a program. That’s helpful to know that something was assembled wrong, or to understand why the verifier is rejecting your program.

  3. I computed these extreme confidence intervals with my old code to test statistical SLOs.

Zach BeaneConverter of maps from Reflex Arena to QuakeWorld

· 14 days ago

Quicklisp newsJanuary 2019 Quicklisp dist update now available

· 14 days ago
New projects: 
  • cl-markless — A parser implementation for Markless — Artistic
  • data-lens — Utilities for building data transormations from composable functions, modeled on lenses and transducers — MIT
  • iso-8601-date — Miscellaneous date routines based around ISO-8601 representation. — LLGPL
  • literate-lisp — a literate programming tool to write common lisp codes in org file. — MIT
  • magicl — Matrix Algebra proGrams In Common Lisp — BSD 3-Clause (See LICENSE.txt)
  • nodgui — LTK — LLGPL
  • petri — An implementation of Petri nets — MIT
  • phoe-toolbox — A personal utility library — BSD 2-clause
  • ql-checkout — ql-checkout is library intend to checkout quicklisp maintained library with vcs. — mit
  • qtools-commons — Qtools utilities and functions — Artistic License 2.0
  • replic — A framework to build readline applications out of existing code. — MIT
  • slk-581 — Generate Australian Government SLK-581 codes. — LLGPL
  • sucle — Cube Demo Game — MIT
  • water — An ES6-compatible class definition for Parenscript — MIT
  • winhttp — FFI wrapper to WINHTTP — MIT
Updated projects3d-matrices3d-vectorsaprilasd-generatorchirpchronicitycl-asynccl-batiscl-collidercl-dbicl-dbi-connection-poolcl-enumerationcl-formscl-hamcrestcl-hash-utilcl-lascl-libevent2cl-libuvcl-mixedcl-neovimcl-openglcl-patternscl-punchcl-satcl-sat.glucosecl-sat.minisatcl-syslogcl-unificationcladclazyclimacsclipcloser-mopcroatoandbusdeedsdefenumdefinitionsdufyeasy-bindeasy-routeseclectoresrapf2clflareflexi-streamsflowgendlglsl-toolkitharmonyhelambdaphu.dwim.debughumblerinquisitorlakelegitlichat-protocollisp-binarylisp-chatlog4cllqueryltkmcclimnew-opomer-countookoverlordpetalisppjlinkplumppostmodernprotestqtoolsquery-fsratifyread-numberrpcqsafety-paramssc-extensionsserapeumslimeslyspecialization-storespinneretstaplestatic-dispatchstumpwmsxqltootertrivial-clipboardtrivial-socketsutilities.print-itemsvernacularwebsocket-driverwild-package-inferred-systemxhtmlambda.

Removed projects: cl-llvm, cl-skkserv

The removed projects no longer build for me.

To get this update, use (ql:update-dist "quicklisp"). Enjoy!

Zach BeaneASCII Art Animations in Lisp

· 14 days ago

Daniel VedderASCII Art Animations in Lisp

· 15 days ago

ASCII art may have fallen out of popular favour a couple of decades ago with the rise of “proper” computer graphics, but they are still fun to create. Having made a few myself, I always had the itch to not just create a static ASCII image, but to try my hand at an ASCII animation. Well, I finally did it. In this post I will show you how to create a very simple animation using Common Lisp and the classic Unix text-user-interface library, ncurses.

ASCII Art Animations

ASCII banner, created with Figlet.

Setting up

To follow along with this tutorial, you will need Linux and ncurses (if you have the former, you probably have the latter as well), a Common Lisp implementation such as SBCL and Quicklisp.

You will also need to grab a copy of croatoan, as this is the wrapper library we will use to call ncurses from Lisp. More specifically, you will need a development version of croatoan. I recently wrote an extension for the package that provides a basic shape-drawing API, which we will be using in the following. It has not yet been pushed to the Quicklisp repo, but you can get it by going to your $QUICKLISP_DIR/local-projects directory and doing git clone https://github.com/veddox/croatoan. ¹

You can find the complete code for this tutorial here.

To the drawing table

Our first block of code looks like this: ²

(ql:quickload :croatoan)
(in-package :croatoan)

(defun display-shape (shape &optional (squarify T))
    "Open a screen display and draw the shape."
    (with-screen (scr :input-blocking T :enable-colors T :input-echoing NIL
                     :cursor-visibility NIL :input-reading :unbuffered)
        (clear scr)
        (draw-shape shape scr squarify)
        (event-case (scr event)
            (otherwise (return-from event-case)))))

First of all, we load the croatoan package and change into it. The next function is one we don't actually need, strictly speaking, but that is useful for development. It takes a shape object (basically a list of character coordinates) and plots it on the current terminal screen. Let's have a closer look:

The macro with-screen is the entry point for croatoan. It hands control of the user's terminal window over to ncurses, changing it from line-mode into character-mode, and back again when the application terminates. We then clear the screen and draw the given shape. The event-case macro is the main program loop provided by croatoan that responds to key and even mouse clicks. We simply want to exit the shape display whenever any key is pressed, though, so we just use the catch-all otherwise clause. (See the croatoan documentation for more details.)

Note: The option squarify is necessary because terminal fonts are rectangular, not square – thus, you cannot simply treat a character-cell display as the geometric equivalent of a pixel screen. (If you plot a mathematically correct circle on a character-cell display, it will still look like an oval.) To correct this, draw-shape inserts a horizontal space after every character when squarify is true.

We can get a first taste of our shape drawing abilities by calling the following in our REPL (assuming you've typed in the above):

* (display-shape (circle 12 20 8 :filled T))

The circle function creates a shape object in the form of a circle, this is passed to our display-shape, and hey-presto, we have a neat circle drawn in our terminal window. Note that for some silly reason, ncurses coordinates are given in the reverse order to what we're used to. Thus, the 12 above is the Y/row coordinate, while the 20 is the X/column coordinate, counting from 0/0, the upper left hand corner. (8 is the circle radius.)

Reinventing the wheel

Our actual animation is going to feature a cart driving across the screen. To do this, we're first going to write a function to create a wheel shape, complete with spokes and all:

(defun wheel (y0 x0 radius &optional (rotation 0) (spokes 4))
    "Create a wheel shape with variable rotation and number of spokes"
    (let ((wheel (circle y0 x0 radius :char (make-instance 'complex-char :simple-char #\O))))
        (dotimes (s spokes wheel)
            (setf wheel (merge-shapes wheel
                            (angle-line y0 x0 (+ rotation (* s (/ 360 spokes))) radius))))))

A few things to point out here. All shape functions take a :char keyword option, which let's you specify how the shape is to be plotted. croatoan's complex-char class includes a character, a colour scheme and an optional effect (italics, bold, etc.). So our wheels will be drawn using a white-on-black (the default colour scheme) O character.

The function angle-line takes an origin, a bearing, and a distance; then returns the corresponding line shape. And lastly, merge-shapes does just what it says: it takes multiple shapes and merges them into a single new shape.

Let's test our wheel function using display-shape:

* (setf w1 (wheel 12 10 8 0 4))

* (setf w2 (wheel 12 30 8 0 8))

* (display-shape (merge-shapes w1 w2))

This should give us the following:

wheels

Building a cart

Obviously, a cart consists of more than just a wheel or two. Here's the rest:

(defun draw-cart (scr x)
    "Draw a cart on the screen at the given X coordinate"
    (clear scr)
    (let* ((h (.height scr)) (w (.width scr))
              (ground (line (1- h) 0 (1- h) (1- w)
                          :char (make-instance 'complex-char :simple-char #\i
                                    :color-pair '(:green :black))))
              (w1 (wheel (- h 8) (- x 12) 6 (* x 45)))
              (w2 (wheel (- h 8) (- x 36) 6 (+ 45 (* x 45))))
              (cart (quadrilateral (- h 16) x (- h 8) (- x 4)
                        (- h 8) (- x 46) (- h 16) (- x 46) :filled T
                        :char (make-instance 'complex-char :simple-char #\#
                                  :color-pair '(:red :black)))))
        (draw-shape ground scr)
        (draw-shape cart scr T)
        (draw-shape w1 scr T)
        (draw-shape w2 scr T)))

This function draws straight to the screen, so we cannot use it in conjunction with display-shape anymore. (We'll take care of that in the next section.) It uses the size information of the screen object it has been passed to calculate the position of the cart's components, then initializes these and finally draws them. New functions here are line and quadrilateral, which should be fairly self-explanatory, though.

Hitting the road

Finally, we are ready to animate our pretty ASCII art! Here's the code to do so:

(defun animate (&optional (fps 10))
    "Show the animation of the moving cart"
    (let ((running T) (current-x 0))
        (with-screen (scr :input-blocking (round (/ 1000 fps)) :enable-colors T
                         :input-echoing NIL :cursor-visibility NIL
                         :input-reading :unbuffered)
            (clear scr)
            (event-case (scr event)
                (#\space (setf running (not running)))
                ((NIL) (when running
                           (incf current-x)
                           (when (= current-x (+ 46 (round (/ (.width scr) 2))))
                               (setf current-x 0))
                           (draw-cart scr current-x)))
                (otherwise (return-from event-case))))))

What have we changed here, compared to display-shape? First of all, the :input-blocking option is now set to a number, instead of T. This has the effect of setting an “update frequency” for the screen. Now, if there is no user input to generate events, the (NIL) event will be generated instead at the specified frequency (in milliseconds). We use this to advance our animation by one frame, wrapping around when the cart drives off the screen. Secondly, instead of just terminating on any key press, we use the space bar to pause/unpause the animation.

And this is what our cart looks like in action:

We're on the road!

Wrapping it up

Of course, our little ASCII art animations won't make the next AAA game (mind the pun), but they could be used as the basis for a jump-and-run or somesuch. Not that you have to find a use for them. Some things are self-justifying, after all, for the simple reason that they are fun :-)

- - -

Updates

1) My shapes extension has been merged (with minimal changes) into the croatoan master branch and should be available in the February version of Quicklisp.

2) In the newest version of croatoan, :input-reading :unbuffered has been changed to :input-buffering nil. Also, draw-shape now takes the window object (scr in this example) as its first, not second argument.

TurtleWareMy everlasting Common Lisp TODO list

· 17 days ago

We have minds capable of dreaming up almost infinitely ambitious plans and only time to realize a pathetic fraction of them. If God exists this is a proof of his cruelty.

This quote is a paraphrase of something I've read in the past. I couldn't find where it's from. If you do know where it comes from - please contact me!

I've hinted a few times that I have "lengthy" list of things to do. New year is a good opportunity to publish a blog post about it. I'm going to skip some entries which seem to be too far fetched (some could have slipped in anyway) and some ideas I don't want to share yet.

Please note, that none of these entries nor estimates are declarations nor project road maps - this is my personal to do list which may change at any time. Most notably I am aware that these estimates are too ambitious and it is unlikely that all will be met.

ECL improvements

In its early days ECL had both. They were removed in favor of native threads. I think that both are very valuable constructs which may function independently or even work together (i.e native thread have a pool of specialized green threads sharing data local to them). I want to add locatives too since I'm at adding new built-in classes.

ETA: first quarter of 2019 (before 16.2.0 release).

There might be better interfaces for the same goal, but there are already libraries which benefit from APIs defined in CLtL2 which didn't get through to the ANSI standard. They mostly revolve around environment access and better control over compiler workings (querying declarations, writing a code walker without gross hacks etc).

ETA: first quarter of 2019 (before 16.2.0 release).

ECL has two major performance bottlenecks. One is compilation time (that is actually GCC's fault), second is generic function dispatch. In a world where many libraries embrace the CLOS programming paradigm it is very important area of improvement.

Professor Robert Strandh paved the way by inventing a method to greatly improve generic function dispatch speed. The team of Clasp developers implemented it, proving that the idea is a practical one for ECL (Clasp was forked from ECL and they still share big chunks of architecture - it is not rare that we share bug reports and fixes across our code bases). We want to embrace this dispatch strategy.

ETA: second quarter of 2019 (after 16.2.0 release).

I think about adding optional modules for SSL and file SQL database. Both libraries may be statically compiled what makes them good candidates which could work even for ECL builds without FASL loading support.

ETA: third quarter of 2019.

  • Compiler modernization

Admittedly I already had three tries at this task (and each ended with a failure - changes were too radical to propose in ECL). I believe that four makes a charm. Currently ECL has two passes (which are tightly coupled) - the first one for Common Lisp compilation to IR and the second one for transpiling to C/C++ code and compiling it with GCC.

The idea is to decouple these passes and have: a frontend pass, numerous optimization passes (for sake of maintainability) and the code generation pass which could have numerous backends (C, C++, bytecodes, LLVM etc).

ETA: third quarter of 2019.

CDR has many valuable proposals I want to implement (some proposals are already implemented in ECL). Functions compiled-file-p and abi-version are a very useful addition from the implementation and build system point of view. Currently ECL will "eat" any FASL it can (if it meets certain criteria, most notably it will try to load FASL files compiled with incompatible ECL build). ABI validation should depend on a symbol table entries hash, cpu architecture and types used by the compiler.

ETA: (this task is a sub-task of compiler modernization).

  • Replacing ecl_min with EuLisp Level 0 implementation

ecl_min is an small lisp interpreter used to bootstrap the implementation (it is a binary written in C). Replacing this custom lisp with a lisp which has the standard draft would be a big step forward.

I expect some blockers along the way - most notably EuLisp has one namespace for functions and variables. Overcoming that will be a step towards language agnostic runtime.

ETA: fourth quarter of 2019.

McCLIM improvements

  • Thread safety and refactored event processing loop

standard-extended-input-stream has a quirk - event queue is mixed with the input buffer. That leads to inconsistent event processing between input streams and all other panes. According to my system and specification analysis this may be fixed. This task requires some refactor and careful documentation.

Thread safety is about using CLIM streams from other threads to draw on a canvas being part of the CLIM frame from inside the external REPL. This ostensibly works, but it is not thread safe - output records may get corrupted during concurrent access.

ETA: first quarter of 2019.

  • Better use of XRender extension

We already use XRedner extensions for drawing fonts and rectangles with a solid background. We want to switch clx backend to use it to its fullest. Most notably to have semi-transparency and accelerated transformations. Some proof of concept code is stashed in my source tree.

ETA: second quarter of 2019.

  • Extensive tests and documentation improvements

I've mentioned it in in the last progress report. We want to spend the whole mid-release period on testing, bug fixes and documentation improvements. Most notably we want to write documentation for writing backends. This is a frequent request from users.

ETA: third quarter of 2019

This task involves writing a console-based backend (which units are very big compared to a pixel and they are not squares). That will help me to identify and fix invalid assumptions in McCLIM codebase. The idea is to have untyped coordinates being density independent pixels which have approximately the same size on any type of a screen. A natural consequence of that will be writing examples of specialized sheets with different default units.

ETA: fourth quarter of 2019

Other tasks

  • Finish writing a CLIM frontend to ansi-test (tested lisps are run as external processes).
  • Create a test suite for Common Lisp pathnames which goes beyond the standard.
  • Contribute UNIX domain sockets portability layer to usocket.
  • Explore the idea of making CLX understand (and compile) XCB xml protocol definitions.
  • Write a blog post about debugging and profiling ECL applications.
  • Resume a project with custom gadgets for McCLIM.
  • Do more research and write POC code for having animations in McCLIM.
  • Resume a project for writing Matroska build system with ASDF compatibility layer.
  • Use cl-test-grid to identify most common library dependencies in Quicklisp which doesn't support ECL and contribute such support.

Conclusions

This list could be much longer, but even suggesting more entries as something scheduled for 2019 would be ridiculous - I'll stop here. A day has only 24h and I need to balance time between family, friends, duties, commercial work, free software, communities I'm part of etc. I find each and every task in it worthwhile so I will pursue them whenever I can.

Zach BeaneWriting a natural language date and time parser

· 20 days ago

McCLIM"Yule" progress report

· 22 days ago

Dear Community,

Winter solstice is a special time of year when we gather together with people dear to us. In pagan tradition this event is called "Yule". I thought it is a good time to write a progress report and a summary of changes made since the last release. I apologise for infrequent updates. On the other hand we are busy with improving McCLIM and many important (and exciting!) improvements have been made in the meantime. I'd love to declare it a new release with a code name "Yule" but we still have some regressions to fix and pending improvements to apply. We hope though that the release 0.9.8 will happen soon.

We are very excited that we have managed to resurrect interest in McCLIM among Common Lisp developers and it is thanks to the help of you all - every contributor to the project. Some of you provide important improvement suggestions, issue reports and regression tests on our tracker. Others develop applications with McCLIM and that helps us to identify parts which need improving. By creating pull requests you go out of your comfort zone to help improve the project and by doing a peer review you prevent serious regressions and make code better than it would be without that. Perpetual testing and frequent discussions on #clim help to keep the project in shape. Financial supporters allow us to create bounties and attract by that new contributors.

Finances and bounties

Speaking of finances: our fundraiser receives a steady stream of funds of approximately $300/month. We are grateful for that. Right now all money is destined for bounties. A few times bounty was not claimed by bounty hunters who solved the issue - in that case I've collected them and re-added to funds after talking with said people. Currently our funds available for future activities are $3,785 and active bounties on issues waiting to be solved are $2,850 (split between 7 issues). We've already paid $2,450 total for solved issues.

Active bounties:

  • [$600] drawing-tests: improve and refactor (new!).
  • [$600] streams: make SEOS access thread-safe (new!).
  • [$500] Windows Backend.
  • [$450] clx: input: english layout.
  • [$300] listener: repl blocks when long process runs.
  • [$150] UPDATING-OUTPUT not usable with custom gadgets.
  • [$150] When flowing text in a FORMATTING-TABLE, the pane size is used instead of the column size.

Claimed bounties (since last time):

  • [$100] No applicable method for REGION-CONTAINS-POSITION-P -- fixed by Cyrus Harmon and re-added to the pool.
  • [$200] Text rotation is not supported -- fixed by Daniel Kochmański.
  • [$400] Fix Beagle backend -- cancelled and re-added to the pool.
  • [$100] with-room-for-graphics does not obey height for graphics not starting at 0,0 -- fixed by Nisar Ahmad.
  • [$100] Enter doesn't cause input acceptance in the Listener if you hit Alt first -- fixed by Charles Zhang.
  • [$100] Listener commands with "list" arguments, such as Run, cannot be executed from command history -- fixed by Nisar Ahmad.
  • [$200] add PDF file generation (PDF backend) -- fixed by Cyrus Harmon; This bounty will be re-added to the pool when the other backer Ingo Marks accepts the solution.

Improvements

I'm sure you've been waiting for this part the most. Current mid-release improvements and regressions are vast. I'll list only changes which I find the most highlight-able but there are more and most of them are very useful! The whole list of commits and contributors may be found in the git log. There were also many changes not listed here related to the CLX library.

  • Listener UX improvements by Nisar Ahmad.
  • Mirrored sheet implementation refactor by Daniel Kochmański.
  • New demo applications and improvements to existing ones,
  • Font rendering refactor and new features:

This part is a joint effort of many people. In effect we have now two quite performant and good looking font rendered. Elias Mårtenson resurrected FFI Freetype alternative text renderer which uses Harfbuzz and fontconfig found in the foreign world. Daniel Kochmański inspired by Freetype features implemented kerning, tracking, multi-line rendering and arbitrary text transformations for the native TTF renderer. That resulted in a major refactor of font rendering abstraction. Missing features in the TTF renderer are font shaping and bidirectional text.

  • Experiments with xrender scrolling and transformations by Elias Mårtenson,
  • Image and pattern rendering refactor and improvements by Daniel Kochmański.

Both experiments with xrender and pattern rendering were direct inspiration for work-in-progress migration to use xrender as default rendering mechanism.

Patterns have now much better support coverage than they used to have. We may treat pattern as any other design. Moreover it is possible to transform patterns in arbitrary ways (and use other patterns as inks inside parent ones). This has been done at expense of a performance regression which we plan to address before the release.

  • CLX-fb refactor by Daniel Kochmański:

Most of the work was related to simplifying macrology and class hierarchy. This caused small performance regression in this backend (however it may be fixed with the current abstraction present there).

  • Performance and clean code fixes by Jan Moringen:

Jan wrote a very useful tool called clim-flamegraph (it works right now only on SBCL). It helped us to recognize many performance bottlenecks which would be hard to spot otherwise. His contributions to the code base were small (LOC-wise) and hard to pin-point to a specific feature but very important from the maintanance, correctness and performance point of view.

  • Text-size example for responsiveness and UX by Jan Moringen,
  • Various gadget improvements by Jan Moringen,
  • Box adjuster gadget rewrite by Jan Moringen:

clim-extensions:box-adjuster-gadget deserves a separate mention due to its usefulness and relatively small mind share. It allows resizing adjacent panes by dragging a boundary between them.

  • New example for output recording with custom record types by Robert Strandh,
  • PostScript and PDF renderer improvements by Cyrus Harmon,
  • Scrigraph and other examples improvements by Cyrus Harmon,
  • Multiple regression tests added to drawing-tests by Cyrus Harmon,
  • Ellipse drawing testing and fixes by Cyrus Harmon,
  • Better contrasting inks support by Jan Moringen,
  • Output recording and graphics-state cleanup by Daniel Kochmański,
  • WITH-OUTPUT-TO-RASTER-IMAGE-FILE macro fixed by Jan Moringen,
  • Regions may be printed readably (with #. hack) by Cyrus Harmon,
  • event-queue processing rewrite by Nisar Ahmad and Daniel Kochmański:

This solves a long standing regression - McCLIM didn't run correctly on implementations without support for threading. This rewrite cleaned up a few input processing abstractions and provided thread-safe code. SCHEDULE-EVENT (which was bitrotten) works as expected now.

  • Extensive testing and peer reviews by Nisar Ahmad:

This role is easy to omit when one looks at commits but it is hard to overemphasize it - that's how important testing is. Code would be much worse if Nisar didn't put as much effort on it as he did.

Plans

Before the next release we want to refactor input processing in McCLIM and make all stream operations thread-safe. Refactoring input processing loop will allow better support for native McCLIM gadgets and streams (right now they do not work well together) and make the model much more intuitive for new developers. We hope to get rid of various kludges thanks to that as well. Thread-safe stream operations on the other hand are important if we want to access CLIM application from REPL in other process than the application frame (right now drawing from another process may for instance cause output recording corruption). This is important for interactive development from Emacs. When both tasks are finished we are going to release the 0.9.8 version.

After that our efforts will focus on improving X11 backend support. Most notably we want to increase use of the xrender extension of clx and address a long standing issue with non-english layouts. When both tasks are accomplished (some other features may land in but these two will be my main focus) we will release 0.9.9 version.

That will mark a special time in McCLIM development. Next release will be 1.0.0 what is a significant number. The idea is to devote this time explicitly for testing, cleanup and documentation with a feature freeze (i.e no new functionality will be added). What comes after that nobody knows. Animations? New backends? Interactive documentation? If you have some specific vision in which direction McCLIM should move all you have to do is to take action and implement the necessary changes :-).

Merry Yule and Happy New Year 2019

This year was very fruitful for McCLIM development. We'd like to thank all contributors once more and we wish you all (and ourselves) that the next year will be at least as good as this one, a lot of joy, creativeness and Happy Hacking!

Sincerely yours,
McCLIM Development Team

Zach BeanePersonal Notes on Corman Lisp 3.1 Release

· 22 days ago

Daniel Vedder"Die einzig wahre Religion?!"

· 29 days ago

Ist es vermessen, zu meinen, man habe den “einzig wahren Glauben”? Diesem Vorwurf sehen sich Christen oft ausgesetzt. Jüngst wieder in einem Streitgespräch über Glaube und Vernunft, das Spektrum der Wissenschaft im Oktober veröffentlichte: Gleich zweimal brachte ihn Volker Sommer, ein agnostischer Primatologe, gegen die gläubige Physikerin Barbara Drossel vor. Dabei ist es eine Anklage, die auf logisch sehr wackligen Beinen steht.

Hier sind die entsprechenden Auszüge aus der Debatte:

Drossel: Ich benutze auch in Glaubensdingen meinen Verstand.
Sommer: Dann sind Sie nicht religiös, sondern nur nicht bereit, die Gotteshypothese aufzugeben. Was ist mit Gläubigen, die Voodoo praktizieren oder Genitalien verstümmeln, um Götter zu beschwichtigen? Irren die sich, und nur Christen kennen die Wahrheit? [...]
Sommer: [...] Dass wir außerdem über den Osterhasen, den Weihnachtsmann oder die Wiedergeburt einen Diskurs führen können, ist keinerlei Beleg für deren Existenz.
Drossel: Man kann doch sinnvoll darüber diskutieren, ob die Existenz dieser Dinge plausibel ist!
Sommer: Aber wenn Sie sich dabei auf den auferstandenen Christus kaprizieren, müssen Sie behaupten, dass nur Christen einen offenbarten Zugang zur Wirklichkeit haben. Oder wie ordnen Sie es ein, wenn in präkolumbischen Andenkulturen Kinder geopfert wurden, um Götter zu beschwichtigen?

In beiden impliziert Prof. Sommer, dass die Beanspruchung des “einzig wahren Glaubens” einen Hochmut darstellt, der eines denkenden Menschen unwürdig ist. Denn wer kann sich schon sicher sein, die Wahrheit erkannt zu haben?

Auf den ersten Blick scheint dies eine berechtigte Sichtweise zu sein. Schließlich sind wir alle nur Menschen, die begrenzte Informationen besitzen und denen oft genug Denkfehler unterlaufen. Ist es da nicht weise und demütig, nicht auf die absolute Wahrheit der eigenen Weltsicht zu pochen?

Aber schauen wir uns die Argumentation genauer an. Letztendlich beruht sie auf einer von zwei Annahmen:

  1. Wir können die Wahrheit grundsätzlich nicht erkennen, oder

  2. Alle Religionen sind irrational und daher unfähig, ihren Wahrheitsanspruch logisch-sachlich zu verteidigen.

Die erste Form begegnet einem häufig im philosophischen Skeptizismus und ist Kern einer Jahrtausende alten Debatte in der Erkenntnistheorie. Obwohl sie nicht objektiv wiederlegbar ist, ist sie dennoch in sich selbst wiedersprüchlich — denn wäre sie wahr, könnten wir das nicht wissen. Damit ist sie als Fundament einer rationalen Debatte unbrauchbar.

In der zweiten Form verbirgt sich im Kontext der zitierten Diskussion jedoch ein Zirkelschluss. Denn mit ihr wird bereits vorausgesetzt, worüber diskutiert werden sollte — nämlich die Behauptung, dass Religion irrational sei. (Übrigens macht sich Sommer schon im ersten Zitat dieses Zirkelschlusses schuldig, als er a priori den Gebrauch des Verstandes als “nicht religiös” deklariert.)

Heben wir diesen Zirkelschluss auf, müssen wir die Möglichkeit in Betracht ziehen, dass zumindest manche Religionen eine rationale Basis haben. Sobald wir das zulassen (und genau dafür plädiert Prof. Drossel, nebst vielen anderen) gewinnen wir einen neuen Blick auf die Religionen. Statt lediglich mystische Gesellschaftskonstrukte zu sein, werden sie zu Theorien über das Wesen der Welt, die es kritisch und sachlich zu beurteilen gilt. Genauso, wie wir naturwissenschaftliche Hypothesen einer genauen Prüfung unterziehen, können wir nun die Religionen auf Basis ihrer rationalen Aussagen prüfen.

Dabei merken wir, dass die verschiedenen Religionen oft sehr widersprüchliche Aussagen machen, die sich unmöglich zur logischen Deckung bringen lassen. Gibt es eine endlose Wiedergeburt? Wie viele Götter gibt es? War Jesus ein Prophet, ein Häretiker, oder Gott selbst? Es können also nicht alle Religionen gleichzeitig wahr sein. Tatsächlich unterscheiden sich die Kernaussagen der großen Religionen so sehr, dass man sagen muss: Entweder stimmt nur eine, oder gar keine.

Damit wären wir am Ausgangspunkt angelangt. Die Alleinwahrheitsansprüche der Religionen (und damit auch des christlichen Glaubens) mögen vermessen klingen, sind aber logisch zwingend. Das bedeutet noch nicht, dass eine von ihnen auch wahr ist. Aber sobald man anerkennt, dass ein Glaube rationale Wurzeln hat, wird er zu einer prüfbaren Weltanschauung, deren Wahrheit man nicht a priori ausschließen kann. Das Christentum beansprucht diese Rationalität für sich — und quer durch die Geschichte bezeugen tausende Denker und Wissenschaftler, dass es möglich ist, mit Herz und Verstand Christ zu sein. Ob sie Recht haben, ist die nächste Frage.

Wimpie NortjeImplementing Hunchentoot custom sessions.

· 34 days ago

Hunchentoot has a built-in session handling mechanism for dealing with web server sessions. The implementation appears to be quite thorough and it complies to most of the OWASP recommendations for secure session handling. Many of its properties are easily customisable and it should be sufficient for most situations but there are some properties which can only be customised by completely replacing the session mechanism.

Hunchentoot stores the session data in RAM and only sends a random token to the client. As per OWASP recommendations, the token is a meaningless number and contains no session data. It only serves as a database lookup key with some embedded security features.

The implication of keeping session information on the server and only in RAM is that a session is only accessible on the same server where it was created and that all session information will be lost when the application exits. When this happens all the users will have to log in again.

This may be an acceptable situation if the website is only served from a single server and when loss of the session data is not a problem. When sessions do contain important data it is better to store it in persistent storage.

When the website is served from multiple servers the same session information must be accessible from all the server instances otherwise a user will need to log in each time his request is handled by a different server.

To address these two issues require changing Hunchentoot's session storage behaviour. This is also one of the properties which can not be customised without replacing session handling completely.

The documentation makes some passing remarks about replacing the session mechanism and there are some comments in the Github issues too.

Hunchentoot is implemented in CLOS and getting from the documentation to your own customised session implementation requires more than a superficial knowledge of CLOS. Without that extra bit of CLOS knowledge it is far from clear how one should approach this customisation.

Below is some code describing the beginning of what such a session replacement looks like.

(defclass custom-session ()
  ( 
   ;; Implement session class 
   )
  (:documentation "Custom session class does not have to descend from hunchentoot:sessions."))

(defclass custom-session-request (hunchentoot:request)
  ( 
  ;; This class does not need any additional code
  )
  (:documentation "Subclass hunchentoot:request to allow using own session class."))

(defmethod hunchentoot:session-cookie-value ((session custom-session))
  ;; Implementation code
  )

(defmethod hunchentoot:session-verify ((request custom-session-request))
  ;; Implementation code
  )

;; Instantiate acceptor to use the custom session
(setf *acceptor*
      (make-instance 'hunchentoot:easy-acceptor
                     :request-class 'custom-session-request))

The documentation states that the only requirements for replacing the session mechanism is implementations for hunchentoot:session-verify and hunchentoot:session-cookie-value specialised on the custom session class.

At a high level this is true but on a more practical level it omits some crucial information.

hunchentoot:session-cookie-value is specialised on the session class and it returns the value to be sent as a cookie to the user.

hunchentoot:session-verify returns the current session object associated with the current request. It specialises on the request rather than the session class, thus you also need a custom request class in addition to the session class.

Even with these two functions and classes implemented Hunchentoot will still not use your custom sessions. To achieve that you must instantiate the acceptor object with your custom request class as parameter.

These steps are sufficient to make Hunchentoot use a custom session class but it is still a long way from a working implementation, much less a secure one.

Replacing the session mechanism is not a trivial project. If you are forced down this path by the limitations of the built-in mechanism the easiest approach is to copy Hunchentoot's session code and modify it to resolve your issue.

Daniel VedderMapping with QGIS

· 34 days ago

In our modern lives, we take readily available maps for granted. The likes of Google Earth, Apple Maps or Open Street Map mean that high quality, high resolution maps are usually just a mouse click or finger swipe away. But sometimes you may find yourself in a place where the existing maps are just not good enough. Or you require a specialised map for a specific purpose. In such cases, you might end up having to create your own. This post shows you how to do that, using the open source software QGIS.

What is QGIS?

GIS (Geographic Information System) programs are a class of software for working with, analysing and displaying spatial data. Think “Google Earth on steroids”. They are powerful and complex tools used by geographers, field ecologists, historians and many others.

The industry standard GIS application is ArcGIS. Unfortunately, as industry standards tend to, this comes with a hefty price tag. Fortunately, there are good open source alternatives available. Chief among these is the aforementioned QGIS. (If you are interested in other options, look here.)

Once you have downloaded and installed QGIS (either via the link above or from the Ubuntu repositories with sudo apt install qgis) you are good to go. In the following I will guide you through the basic steps to create a map with QGIS, using as an example the map I made of my friends' farm.

Creating the initial map

QGIS

A QGIS project consists of a stack of layers, shown in the left-hand pane in the screenshot above. Their visibility can be turned on and off as desired. A layer may represent raster or vector data – in this example, we will be working exclusively with the latter.

When making a map, I use separate layers for each kind of object I want to add, such as streams, roads or buildings. Basically, there are three types of vector layer: points, lines, and polygons. To create a new layer, click the New Shapefile Layer button at the bottom of the left toolbar. Choose what type of layer you want and confirm (don't worry about the rest of the dialog for now). After you have specified a file name and location, the new layer will appear in your layer stack and may be moved up or down at will.

To add new features, you need to enable editing for this layer. Select the layer and click the pencil icon (Toggle Editing) in the second-from-top toolbar. You can now add features (Add Feature button in the same toolbar) or edit them (using the Node Tool). When Add Feature is on, left-click anywhere on the map to set the first node, then continue setting nodes until the shape is complete. Right-click, enter an ID number for this feature, and you're done. Don't forget to save your layer using the Save Layer Edits button next to the pencil icon. (Every layer is an individual file and must be saved separately from the whole project.) If you want to see which features you have added already in a given layer, clicking the Open Attribute Table in the top toolbar will display a list. The appearance of a layer may be changed by right-clicking on it and going to PropertiesStyle.

When starting on a new map, I find it useful to take a look at a satellite image of the place. You can do this from within QGIS by using the OpenLayers plugin. (Install this using the Plugins menu.) Once installed, go to the menu WebOpenLayers pluginGoogle MapsGoogle Satellite. This will add a new special layer to your project that displays a satellite view in your current map window. Although this view cannot be printed or saved, it is very useful for creating a first rough draft for the map. Simply “trace over” it by adding new vector features on top of the satellite image.

Adding GPS data with Map Plus

As useful as the OpenLayers plugin is, it probably won't get you all the way. Satellite images may be out of date, or the features you want may not be visible on them. (For example, paths in a forest are often hidden under the canopy.) So to get an accurate, complete map, you will have to import GPS data from elsewhere.

Proper GPS devices (such as those by Garmin) are good, but expensive. I have found a smartphone to be a perfectly acceptable replacement for my requirements. To record and export GPS data on iOS, I use an app called Map Plus. It provides all the features I need for mapping, even in the free version. (As I don't use Android, I don't know what an equivalent app would be, but I'm sure there's one around.)

When you've opened Map Plus, touch the arrow button to turn on live location and tap it again to start recording. Now, simply walk the route you want to add to your map. At the end, save the recording and export it to your computer as a KML, GPX, or SHP file (which one doesn't matter, but only KML is available in the free version).

Back in QGIS, click the Add Vector Layer button on the top left and select your GPS file to open it. Trace over the new line with the appropriate layer(s). Then lather, rinse, repeat…

Pro Tip: If you install the QTiles plugin, you can create a .mbtiles file of your map. This can then be back-imported into Map Plus – allowing you to use your own map on your phone!

Preparing for print

When you have added everything you want to your QGIS project, it is time to export your map. QGIS provides a basic desktop publishing tool for doing the necessary layout work. To open this, click ProjectNew Print Composer.

The print composer will show you a blank sheet of paper that you can add your map to. You can select the zoom level, move it around, add a legend and labels, and do a whole host of other things. (For example, I find it useful to have a grid in the map background to better estimate distances.) Admittedly, this tool is a bit fiddly, but when you've played around with it a little you shouldn't have too many problems with it.

And in the end, you get to export your masterpiece to PDF or JPG for printing or publishing:

Final map

Conclusion

Maps are beautiful things. Aside from helping us to not get lost, a map gives us a mental image of a place that is hard to get any other way. It shows the big picture, the proportions and angles that are so easily lost when we see a place from the ground perspective. Maps awaken our curiosity and our wanderlust. They help us plan and can even help defend our rights.

Creating an accurate map used to be hard; a difficult and arduous skill reserved for trained specialists. (Read the story of Ordnance Survey for more insight into historical mapmaking.) Nowadays, anybody with a smartphone and a working computer can do it. I found I very much enjoy mapmaking: it combines lots of outdoor exploring with fun work at the computer, and a useful print product to show for it at the end.

And of course, as T.S. Eliot wrote:

We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.

Happy mapmaking!

Quicklisp newsDecember 2018 Quicklisp dist update now available

· 41 days ago
New projects:
  • agutil — A collection of utility functions not found in other utility libraries. — MIT
  • aserve — AllegroServe, a web server written in Common Lisp — LLGPL 
  • cl-batis — SQL Mapping Framework for Common Lisp — MIT
  • cl-dbi-connection-pool — CL-DBI-Connection-Pool - connection pool for CL-DBI — LLGPL
  • cl-json-pointer — A JSON Pointer (RFC6901) implementation for Common Lisp. — MIT
  • cl-punch — Scala-like anonymous lambda literal — MIT
  • definitions-systems — Provides a simple unified extensible way of processing named definitions. — Public Domain
  • easy-bind — Easy-bind - easy local binding for Common Lisp — MIT
  • first-time-value — Returns the result of evaluating a form in the current lexical and dynamic context the first time it's encountered, and the cached result of that computation on subsequent evaluations. — Public Domain
  • hyperspec — A simple library for looking up common-lisp symbols in the hyperspec. — LLGPLv3+
  • its — Provides convenient access to multiple values of an object in a concise, explicit and efficient way. — Public Domain
  • mra-wavelet-plot — Plot MRA-based wavelets (scaling function and mother wavelet) with given coefficients of the dilation equation — 2-clause BSD
  • openid-key — Get OpenID keys from issuer. — MIT
  • pjlink — A library for communicating with PJLink-compatible projectors over TCP/IP. see https://pjlink.jbmia.or.jp/english/ for information on PJLink and compatible devices. — CC0 1.0 Universal
  • poler — Infix notation macro generator — LLGPL
  • rpcq — Message and RPC specifications for Rigetti Quantum Cloud Services. — Apache 2
  • shadowed-bindings — Establishes a new lexical context within which specified bindings are explicitly shadowed, making it clear that they are not referenced within, thereby reducing cognitive load. — Public Domain
  • static-dispatch — Static generic function dispatch for Common Lisp. — MIT
  • trivial-jumptables — Provides efficient O(1) jumptables on supported Common Lisp implementations and falls back to O(log(n)) on others. — Public Domain
  • trivial-sockets — trivial-sockets — MIT
  • utility — A collection of useful functions and macros. — MIT
  • wild-package-inferred-system — Introduces the wildcards `*' and `**' into package-inferred-system — MIT
Updated projectsalexandriaaprilarchitecture.builder-protocolarchitecture.hooksasdf-vizbstcamblcari3scarriercavemancffichronicitycl-anacl-bibtexcl-cffi-gtkcl-charmscl-cognitocl-collidercl-conllucl-dbicl-digraphcl-environmentscl-epochcl-hamcrestcl-json-helpercl-ledgercl-markdowncl-patternscl-pythoncl-quickcheckcl-strcl-tetris3dcl-tiledcl-tomlcl-unificationclazyclipcloser-mopclxcodexcovercroatoandbusde.setf.wilburdefinitionsdocparserdufyeclectorevent-emitterf2clfemlispfiascoflarefloat-featuresfunction-cachefxmlgamebox-mathgendlgenhashglsl-toolkitgolden-utilsharmonyhelambdaphttp-bodyhu.dwim.web-serverip-interfacesironcladjonathanjsonrpclacklisp-binarylisp-chatlocal-timemaidenmcclimmmapopticloverlordparachuteparenscriptparser.common-rulespetalisppgloaderplexippus-xpathplumpplump-sexppostmodernprotestprotobufqbase64qlotquriracerregular-type-expressionsafety-paramssc-extensionsserapeumshadowsimple-tasksslysnakessnoozestaplestealth-mixinstefilstumpwmthe-cost-of-nothingtime-intervaltrivial-benchmarktrivial-utilitiesumbrautilities.binary-dumpvgplotwebsocket-driverwith-c-syntaxwoozacl.

To get this update, use (ql:update-dist "quicklisp")

Enjoy!

Nicolas HafnerAbout Making Games in Lisp - Gamedev

· 44 days ago

header
Recently there's been a bit of a storm brewing about a rather opinionated article about game development with Lisp. After reading Chris Bagley's very well done response, I thought I'd share my perspective on what it's like to actually make games with Lisp. I'm not writing this with the intent on convincing you of any particular arguments, but rather to give some insight into my process and what the difficulties and advantages are.

I'll start this off by saying that I've been working with games in some capacity as long as I can remember. My programming career started out when I was still a young lad and played freeware games on a dinky Compaq laptop with Windows 95. Making really terrible games is almost all I did in terms of programming all throughout primary school. I branched out into other software after that, but making games is something that has always kept sticking around in my mind.

Naturally, when it came to having learnt a new programming language, it didn't take too long before I wanted to make games again. And of course, because I'm a stubborn idiot, I decided to build an engine from scratch - it wasn't my first one, either. This is what lead to Shirakumo's Trial engine.

Since then, the team and I have built a couple of "games" with Trial:

  • LD35 Supposed to be a sort of farming game, due to massive engine problems ended up being just a test for animations and basic 3d rendering.
  • LD36 A very basic survival game that lets you build fire places and eat stuff. Based on the tech from the previous game.
  • LD38 An experiment in non-linear storytelling. The idea was to have a dialog based mystery game, but we ran out of time.
  • Rush A 2D platformer with a lighting mechanic. This one is actually a game that can be played for some time.
  • Shootman An excuse to stream some gamedev. Mostly modelled after "Enter the Gungeon," it's an isometric bullet hell shooter.

None of these are big, none of these are great. They're all more experiments to see what can be done. What I've learned most of all throughout all my time working on games is that I'm not good at making games. I'm decent at making engines, which is a very, very different thing.

If you're good at making games, you can make an engaging game with nothing more than format, read-line, and some logic thrown in. If you're bad at making games like I am, you build a large engine for all the features you imagine your game might need, and then you don't know how to proceed and the project dies. You may notice that this also has a bit of an implication, namely that for making the game part of a game, the choice of language matters very little. It matters a lot for the engine, because that's a software engineering problem.

I'm writing this because this is, to me, an important disclaimer: I don't well know how to make games. I can write code, program mechanics, make monkeys jump up and down on your screen, but that's not the meat of a game and usually not why people play games either. Thus my primary difficulty making games has absolutely nothing to do with the technology involved. Even if I were using Unity or Unreal, this problem would not go away. It was the same when I was last writing games in Java, and it was the same when I was using GameMaker.

Now, why am I not using a large, well made engine to make games? Is it because I've been tainted by Lisp and don't want to use other languages in my free time anymore? Is it because the game making problem would persist anyway so what's the point? Is it because I like making engines? Is it because I'm stupid? Well, the answers are yes, yes, yes, and yes.

Alright, so here we are: Lisp is the only choice left, I like making engines and don't know how to make games, so what are the difficulties and advantages of doing that?

As you might know, I'm currently working on a game, so I have a lot of immediate thoughts on the matter. What seems to bother me the most is that currently I don't have a built in, usable scene editor in Trial. For every game so far we had to either build an editor from scratch, or place things manually in code. Both of these things suck, and making an editor that isn't a huge pain to use takes a long, long time. Part of the issue with that is that Trial currently does not have a UI toolkit to offer. You can use it with the Qt backend and use that to offer a UI, but I really don't want to force using Qt just for an editor. Not to mention that we need in-game UI capabilities anyway.

All of the UI toolkits I've seen out there are either a giant blob of foreign code that I really don't want to bind to, or they're McCLIM which won't work with OpenGL in what I project to be the next decade or more. So, gotta do it myself again. I have some nice and good ideas for making a design that's different and actually very amenable towards games and their unique resolution constraints, but making a UI toolkit is a daunting effort that I have so far not felt the energy to tackle.

Aside from the lack of an editor and UI toolkit, I actually have very few complaints with the current state of Trial for the purposes of my game. It handles asset management, shaders and effects pipelines, input and event delivery, and so forth. A lot of the base stuff that makes OpenGL a pain in the neck has been taken care of.

That said, there's a lot of things I had to implement myself as well that could be seen as something the engine should do for you: map save and load, save states, collision detection and resolution, efficient tile maps. Some of the implementations I intend to backport into Trial, but other things that might seem simple on first look like maps and save states, are actually incredibly domain specific, and I'm currently unconvinced that I can build a good, generic system that can handle this.

One thing that I think was a very good decision for Trial that I still stand by is the idea to keep things as modular and separate as possible. This is so that, as much as possible, you won't be forced to use any particular feature of the engine and can replace them if your needs demand such. If you know anything at all about architecture, this is a very difficult thing to do, and something that I believe would be a huge lot more difficult if it weren't implemented in Lisp. Modularity, re-usability, and extensibility are where Lisp truly shines.

Unfortunately for us, games tend to need a lot of non-reusable, very problem-specific solutions and implementations. Sure, there's components that are re-usable, like a rendering engine, physics simulations, and so forth. But even within those you have a tremendous effort in implementing game-specific mechanics and features that can't be ported elsewhere.

But, that's also great for me because it means I can spend a ton of time implementing engine parts without having to worry about actually making a game. It's less great for the chances of my game ever being finished, but we'll worry about that another time.

Right now I'm working on implementing a quest and dialog system in the game, which is proving to be an interesting topic on its own. Lisp gives me a lot of nifty tools here for the end-user, since I can wrap a lot of baggage up in macros that present a very clean, domain-geared interface. This very often alleviates the need to write scripting languages and parsers. Very often, but not always however. For the dialog, the expected amount of content is so vast that I fear that I can't get away with using macros, and need to implement a parser for a very tailored markup language. I've been trying to get that going, but unfortunately for reasons beyond me my motivation has been severely lacking.

Other than that, now that all the base systems for maps, saves, chunks, tiles, and player mechanics are in place the only remaining part is UI stuff, and we already discussed the issue with that. This also means that I really need to start thinking about making a game again because I've nearly run out of engine stuff to do (for now). We'll see whether I can somehow learn to shift gears and make an actual game. I really, really hope that I can. I want this to work.

I've talked a lot about my own background and the kinds of problems I'm facing at the moment, and very little about the process of making these games. Well, the process is rather simple:

  1. Decide on a core idea of the game.
  2. Figure out what the player should be able to do and the kinds of requirements this has on the engine.
  3. Implement these requirements in the engine.
  4. Use the features of the engine to build the game content. This requires the most work.
  5. As you develop content and the vision of the game becomes clearer, new ideas and requirements will crystallise. Go back to 3.
  6. Your game is now done.

Again, the bulk of the work lies in making content, which is rather orthogonal to the choice of your language, as long as the tools are mature enough to make you productive. I believe Lisp allows me to be quicker about developing these tools than other languages, but making an actual game would be even quicker if I didn't have to make most of these tools in the first place.

So if there's anything at all that I want for developing games in Lisp, it wouldn't be some magical engine on par with Unreal or whatever, it wouldn't even be more libraries and things. I'm content enough to build those myself. What I'd really like is to find the right mindset for making game content. Maybe, hopefully, I will at some point and I'll actually be able to publish a game worth a damn. If it happens to have been developed with Lisp tools, that's just a bonus.

If you've made it this far: thank you very much for reading my incoherent ramblings. If you're interested in my game project and would like to follow it, or even help working on it, hang out in the #shirakumo channel on Freenode.

Daniel VedderAn Impression of Common Lisp

· 49 days ago

Common Lisp is a lovely language to work with. Although it has faded into an unfortunate semi-obscurity on the modern computing landscape, it is still a powerful and elegant language that is a joy to use. I've had a lot of fun playing around with it and discovering more about it in the past two months, and wanted to record and share some of that in this post.

xkcd: lisp cycles

© Randall Munroe, xkcd.com


A very brief history of Lisp

Lisp is the second-oldest programming language still in use (developed in 1958 - only Fortran is older, by a few years). It had its heyday as the language of Artificial Intelligence research in the 60s to 80s.

Over the years, a large spectrum of dialects developed. Today, the three most prominent are Common Lisp, Scheme, and Clojure. The latter is quite new, built on top of the JVM, and has seen some recent popularity. Scheme is notable for being small and elegant, and is predominantly used as a teaching language (most famously at MIT, where it was used in conjunction with SICP).

Common Lisp itself was created in the 80s and 90s, with an ANSI standard published in 1994. Its purpose was to combine the features of various dialects then in existence, to bring a modicum of order into what had become a very fragmented landscape. Several implementations of the standard (both propietary and open-source) are available and still actively maintained. Though not particularly wide-spread in the world of software engineering, Common Lisp has carved out a niche for itself and retains a loyal following.

My history with Lisp

I first learnt Lisp in 2014. At the time, I had only been programming for two or three years (in Java and Python), and the very different approach used by Lisp really threw me. But after a couple of weeks of trawling through a good introductory book, I slowly got the hang of things.

My first bigger project was an framework for text-based adventure games (Project Atlantis), including an interpreter for a simple Domain Specific Language to describe game worlds. This was pretty much written in pure Common Lisp, with no additional libraries used.

The next few years I learnt regrettably little of the language, being occupied with other languages that I needed for university. Just recently, however, I had the chance to jump back into things - leading to the discovery that I had only ever scratched the surface of what Common Lisp can do. I learnt a lot more about the intricacies of the language itself, as well as how to work with the third-party package ecosystem. And that gave rise to this post…

Writing Lisp is writing poetry

One thing I had noticed right from the beginning is how much fun it is to write Lisp. I once told a friend: “Writing C++ is like writing a technical manual, Java like a school essay, Python like an email to a friend - but Lisp is like writing poetry.” I found it hard to explain at the time, but I think I may have an explanation now.

S-expressions

A lot of it has to do with Lisp's use of S-expressions. Basically, all Lisp code consists of nested expressions of the form (<function/keyword> <args>) For example, the following function checks whether a given list is a set (i.e. no elements occur more than once):

(defun set-p (lst)
    (cond ((null lst) T)
        ((member (car lst) (cdr lst)) NIL)
        (T (set-p (cdr lst)))))

Coming from a C-based world, that looks rather unusual. (What exactly each function call above does is irrelevant at the moment.) The important thing to realise is that every single S-expression has a value - and can thus be directly used as an argument to another expression. Unlike in imperative languages like C, one doesn't have to use as many one-time-variables. You can stick if-clauses and even loops straight into a function call. Instead of writing (in Java):

int a = arg;
if (a > cutoff) a = cutoff;
System.out.println(a);

you can simply write:

(format t "~&~S" (if (> arg cutoff) cutoff arg))

This style of writing code takes some getting used to, but in the end it makes for shorter and more elegant code.

Lastly, note that Lisp does not have any operators. +, -, <, >, etc, are all functions. Everything is an S-expression, giving a very fluent feel to the language.

Lists

LISP stands for “List processing”, and handling lists is what Lisp does best. Turns out, lists are incredibly versatile data structures. Though they may not always give the best performance, they are fantastically simple to use. Also, building other data structures on top of them (such as trees or graphs) is trivial.

Common Lisp provides plenty of functions to work with lists, and defining your own is easy. (Should you need the extra performance, Lisp does of course also offer arrays and other data structures.)

Macros

Macros are often touted as the single greatest advantage of Lisp. I have to say I still have not understood them completely (it's a complex topic), but they do allow you to do some pretty cool things.

Unlike C preprocessor macros, Lisp macros are an integral part of the language, with the whole expressive power of the language available to them. Basically, they can be used to extend the very syntax of the language.

There are many macros built in to Common Lisp. For example, there are at least five looping constructs availabe in standard CL. This may sound like an overkill, but it actually means that you can use precisely the right form for the job at hand - minimising lines of code and increasing readability.

Simple macros are often used to abstract away boilerplate code, for example when declaring numerous subclasses. More complicated macros (such as the built-in loop macro) can do wonderous things with the language … but I am not sufficiently well-versed in “macrology” to talk authoritatively on the subject.

Functional programming and object orientation

Historically, Lisp has been seen as a functional programming language. This is a style of programming that emphasises the role of, well, functions. Ideally, these functions should not have any side effects, like modifying global state (i.e. a function should always return the same output from the same input). This differentiates functions from procedures - the latter is a set of instructions, the former a discrete mathematical construct. It's too big a topic to expound here, but writing side-effect-free functions like this has some very tangible rewards, especially when it comes to debugging and testing a program. It is also a much more natural way to approach certain problems than the currently ubiquitous object-oriented paradigm.

Related to functional programming is applicative programming, which involves passing functions around as arguments. Lisp has (indeed, it invented) functions as first-class objects. This means that functions can be returned by, created by, and passed to other functions. Again, this lets you do some really neat things.

But Common Lisp is not only a functional programming language. It also offers a powerful facility for object-oriented programming, known as CLOS (Common Lisp Object System). This is actually a lot more flexible than the OO approach used by Java and similar languages, though does take some getting used to.

In short, whatever style of programming you prefer, Common Lisp has you covered.

From prose to poetry

So why is Lisp such a joy to write? I think the short answer is that it is a language that gets out of your way. It lets you say exactly what you want to say. By default it already offers you a lot of different constructs to choose from - and if you don't like any of them, you just whip out a macro and create your own. You can always choose exactly the right tool for the job. It also encourages abstraction: programs that are built up in layers, with each successive layer not having to worry about the nitty-gritty of whatever is going on below it.

And because of these two things, you end up writing code that is terse and succinct, that gets a lot done in just a few lines. This is not only faster to write, but also easier to read - and therefore easier to conceptualise, extend and debug. Aside from the beauty of the balanced parens, I believe that this is the source of the aesthetic appeal of Lisp.

The State of the Union

Thus far on the language itself. But what about the state of its ecosystem? Programmers rarely create any sizeable program purely with the core language, but usually rely on third-party libraries. The following is a very brief overview of the Common Lisp world in 2018.

Implementations

As mentioned above, there are various implementations of the Common Lisp standard available. Each has their own strengths and uses. SBCL, the most popular open-source implementation, is known for its high-quality compiler. (Well-written code compiled with SBCL can reach near-C speeds.) GNU CLISP is not as fast, but more user-friendly. ABCL compiles to Java bytecode, ECL can be embedded in C applications. And the list goes on…

This range of options can be a little intimidating to a newcomer, but in most cases, the differences shouldn't matter (as each implementation conforms to the Common Lisp standard). If in doubt, use SBCL.

ASDF and Quicklisp

The most fundamental of all CL libraries is asdf - indeed, it comes pre-shipped with most modern implementations. ASDF (Another System Definition Facility) is a build system for Common Lisp projects. Comparable to make, it is used to define dependencies between different project components and on third-party libraries. It also stores metadata about the project. Though not part of the Common Lisp standard, it has become the de facto standard for distributing Lisp software.

Built on top of ASDF is Quicklisp, a package manager and repository for Common Lisp libraries. Quicklisp provides a simple user interface to load and install external libraries. Pretty much all important projects are available for download via Quicklisp - the repository currently holds over 1,500 libraries (and is updated monthly).

Available libraries

A comparatively small community means that Common Lisp does not have as rich a package ecosystem as, say, Python. Its standard library is also pretty small. (In Python-speak, it does not come batteries included.)

Nonetheless, though highly specialised packages are not to be expected, the Quicklisp repository does offer a good, broad selection - ranging from the likes of croatoan (an ncurses wrapper) to coleslaw (the static site generator this blog is built with).

I don't have a huge amount of experience yet, but from what I've seen, the average quality seems to be fairly good. There's a couple of “big-name” libraries around: alexandria for general utility functions, bordeaux-threads for multi-threading support, hunchentoot as a web server, etc. On the whole, documentation tends to be decent - ranging from “minimal but existent” to “very good”. Don't expect a whole lot of online tutorials, though. Despite this, I have had remarkably few problems working with the libraries I have used so far - due in large part to the fact that their source code is eminently readable. Most of them don't have a very large code base, but even so: a quick browse and some grepping is usually all that's needed to discover the specifics of a certain feature. I have found this a very pleasant experience; and vastly different to, say, trying to understand the internals of a PHP Nextcloud plugin.

The community

Common Lisp has a small, but capable and active community. There's not a lot of people who write Lisp; but those who do tend to know what they are doing. There's various IRC channels and mailinglists for those who are interested, as well as a handful of blogs. I've gone to the #lisp IRC channel for help before and found the people there friendly and helpful.

Personal conclusion

I thought I knew Common Lisp after my first project, three years ago - but now, even after another two months of intense learning, there's still a whole lot left to learn. Although the basics of the language are not hard to master, Common Lisp is a very large and complex system on the whole (unlike its much smaller sibling, Scheme). That makes it clunky and sometimes even ugly in places, but all the more exciting to explore :-)

On the whole, though, it is a beautiful language that is truly fun to work with. I wish I would have the chance of working in it at university. (At least we are using Julia in my research group - a language that has taken on almost every Lisp feature, short of S-expressions.) We'll see, I may have a chance yet. And until then, there'll always be side projects to work on…

Further reading

Addendum

Somebody posted this article on Reddit, whereupon I got an email comment pointing out two further resources that merit inclusion in the above list:

Also, in the list of implementations above, I should have mentioned Clozure (aka CCL), as it is another mature and popular implementation that also works on Mac and Windows.

Vsevolod DyomkinStructs vs Parametric Polymorphism

· 55 days ago
Recently, Tamas Papp wrote about one problem he had with Lisp in the context of scientific computing: that it's impossible to specialize methods on parametric types.
While you can tell a function that operates on arrays that these arrays have element type double-float, you cannot dispatch on this, as Common Lisp does not have parametric types.
I encountered the same issue while developing the CL-NLP Lisp toolkit for natural language processing. For instance, I needed to specialize methods on sentences, which may come in different flavors: as lists of tokens, vectors of tokens, lists of strings or some more elaborate data-structure with attached metadata. Here's an example code. There's a generic function to perform various tagпing jobs (POS, NER, SRL etc.) It takes two arguments: the first — as with all CL-NLP generic functions — is the tagger object that is used for algorithm selection, configuration, as well as for storing intermediate state when necessary. The second one is a sentence being tagged. Here are two of its possible methods:

(defmethod tag ((tagger ap-dict-postagger) (sent string)) ...)
(defmethod tag ((tagger ap-dict-postagger) (sent list)) ...)
The first processes a raw string, which assumes that we should invoke some pre-processing machinery that tokenizes it and then, basically, call the second method, which will perform the actual tagging of the resulting tokens. So, list here means list of tokens. But what if we already have the tokenization, but haven't created the token objects? I.e. a list of strings is supplied as the input to the tag method. The CLOS machinery doesn't have a way to distinguish, so we'll have to resort to using typecase inside the method, which is exactly what defmethod replaces as a transparent and extensible alternative. Well, in most other languages we'll stop here and just have to assert that nothing can be done and it should be accepted as is. After all, it's a local nuisance and not a game changer for our code (although Tamas refers to it as a game-changer for his). In Lisp, we can do better. Thinking about this problem, I see at least 3 solutions with a varying level of elegance and portability. Surely, they may seem slightly inferior to such capability being built directly into the language, but demanding to have everything built-in is unrealistic, to say the least. Instead, having a way to build something in ourselves is the only future-proof and robust alternative. And this is what Lisp is known for. The first approach was mentioned by Tamas himself:
You can of course branch on the array element types and maybe even paper over the whole mess with sufficient macrology (which is what LLA ended up doing), but this approach is not very extensible, as, eventually, you end up hardcoding a few special types for which your functions will be "fast", otherwise they have to fall back to a generic, boxed type. With multiple arguments, the number of combinations explodes very quickly.
Essentially, rely on typecase-ing but use macros to blend it into the code in the most non-intrusive way, minimizing boilerplate. This is a straightforward path, in Lisp, and it has its drawbacks for long-running projects that need to evolve over time. But it remains a no-brainer for custom one-offs. That's why, usually, few venture further to explore other alternatives. The other solution was mentioned in the Reddit discussion of the post:
Generics dispatching on class rather than type is an interesting topic. I've definitely sometimes wanted the latter so far in doing CL for non-scientific things. It is certainly doable to make another group of generics that do this using the MOP.
I.e. use the MOP to introduce type-based generic dispatch. I won't discuss it here but will say that similar things were tried in the past quite successfully. ContextL and Layered functions are some of the examples. Yet, the MOP path is rather heavy and has portability issues (as the MOP is not in the standard, although there is the closer-to-mop project that unifies most of the implementations). In my point of view, its best use is for serious and fundamental extension of the CL object system, not to solve a local problem that may occur in some contexts but is not so pervasive. Also, I'd say that the Lisp approach that doesn't mix objects and types (almost) is, conceptually, the right one as these two facilities solve a different set of problems. There's a third — much simpler, clear and portable solution that requires minimal boilerplate and, in my view, is best suited for such level of problems. To use struct-s. Structs are somewhat underappreciated in the Lisp world, not a lot of books and study materials give them enough attention. And that is understandable as there's not a lot to explain. But structs are handy for many problems as they are a hassle-free and efficient facility that provides some fundamental capabilities. In its basic form, the solution is obvious, although a bit heavy. We'll have to define the wrapper structs for each parametric type we'd like to dispatch upon. For example, list-of-strings and list-of-tokens. This looks a little stupid and it is, because what's the semantic value of a list of strings? That's why I'd go for sentence/string and sentence/token which is a clearer naming scheme. (Or, if we want to mimic Julia, sentence<string>).

(defstruct sent/str
toks)
Now, from the method's signature, we will already see that we're dealing with sentences in the tagging process. And will be able to spot when some other tagging algorithm operates on the paragraphs instead of words: let's say, tagging parts of an email with such labels as greeting, signature, and content. Yes, this can also be conveyed via the name of the tagger, but, still, it's helpful. And it's also one of the hypothetical fail cases for a parametric type-based dispatch system: if we have two different kinds of lists of strings that need to be processed differently, we'd have to resort to similar workarounds in it as well. However, if we'd like to distinguish between lists of strings and vectors of strings, as well as more generic sequences of strings we'll have to resort to more elaborate names, like sent-vec/str, as a variant. It's worth noting though that, for the sake of producing efficient compiled code, only vectors of different types of numbers really make a difference. A list of strings or a list of tokens, in Lisp, uses the same accessors so optimization here is useless and type information may be used only for dispatch and, possibly, type checking. Actually, Lisp doesn't support type-checking of homogenous lists, so you can't say :type (list string), only :type list. (Wel, you can, actually uses (and satisfies (lambda (x) (every 'stringp x)), but what's the gain?) Yet, using structs adds more semantic dimensions to the code than just naming. They may store additional metadata and support simple inheritance, which will come handy when we'd like to track sentence positions in the text and so on.

(defstruct sent-vec/tok
(toks nil :type (vector tok)))

(defstruct (corpus-sent-vec/tok (:include sent-vec/tok))
file beg end)
And structs are efficient in terms of both space consumption and speed of slot access.
So, now we can do the following:

(defmethod tag ((tagger ap-dict-postagger) (sent sent/str)) ...)
(defmethod tag ((tagger ap-dict-postagger) (sent sent/tok)) ...)
(defmethod tag ((tagger ap-dict-postagger) (sent sent-vec/tok)) ...)
We'll also have to defstruct each parametric type we'd like to use. As a result, with this approach, we can have the following clean and efficient dispatch:

(defgeneric tag (tagger sent)
(:method (tagger (sent string))
(tag tagger (tokenize *word-splitter* sent))
(:method (tagger (sent sent/str))
(tag tagger (make-sent/tok :toks (map* ^(prog1 (make-tok
:word %
:beg off
:end (+ off (length %)))
(:+ off (1+ (length %)))
@sent.toks)))
(:method ((tagger pos-tagger) (sent sent/tok))
(copy sent :toks (map* ^(copy % :pos (classify tagger
(extract-features tagger %))
@sent.toks))))

CL-USER> (tag *pos-tagger* "This is a test.")
#S(SENT/TOK :TOKS (<This/DT 0..4> <is/VBZ 5..7> <a/DT 8..9>
<test/NN 10..14> <./. 14..15>))
Some of the functions used here, ?, map*, copy, as well as @ and ^ reader macros, come from my RUTILS, which fills the missing pieces of the CL standard library. Also an advantage of structs is that they define a lot of things in the background: invoking type-checking for slots, a readable print-function, a constructor, a builtin copy-structure and more. In my view, this solution isn't any less easy-to-use than the static-typed one (Julia's). There's a little additional boilerplate (defstructs), which may be even considered to have a positive impact on the code's overall clarity. And yes, you have to write boilerplate in Lisp sometimes, although not so much of it. Here's a fun quote on the topic I saw on twitter some days ago:
Lisp is an embarrassingly concise language. If you're writing a bunch of boilerplate in it, you need to read SICP & "Lisp: A Language for Stratified Design".
P.S. There's one more thing I wanted to address from Tamas's post
Now I think that one of the main reasons for this is that while you can write scientific code in CL that will be (1) fast, (2) portable, and (3) convenient, you cannot do all of these at the same time.
I'd say that this choice (or rather a need to prioritize one over the others) exists in every ecosystem. At least, looking at his Julia example, there's no word of portability (citing Tamas's own words about the language: "At this stage, code that was written half a year ago is very likely to be broken with the most recent release."), while convenience may be manifest well for his current use case, but what if we require to implement in the same system features that deal with other areas outside of numeric computing? I'm not so convinced. Or, speaking about Python, which is a goto language for scientific computing. In terms of performance, the only viable solution is to implement the critical parts in C (or Cython). Portable? No. Convenient — likewise. Well, as a user you get convenience, and speed, and portability (although, pretty limited). But at what cost? I'd argue that developing the Common Lisp scientific computing ecosystem to a similar quality would have required only 10% of the effort that went into building numpy and scipy...

Wimpie NortjeHow to write test fixtures for FiveAM.

· 57 days ago

When you write a comprehensive test suite you will most likely need to repeat the same set up and tear down process multiple times because a lot of the tests will test the same basic scenario in a slightly different way.

Testing frameworks address this code repetition problem with "fixtures". FiveAM also has this concept, although slightly limited1.

FiveAM implements fixtures as a wrapper around defmacro. The documentation states:

NB: A FiveAM fixture is nothing more than a macro. Since the term 'fixture' is so common in testing frameworks we've provided a wrapper around defmacro for this purpose.

There are no examples in the documentation on what such a fixture-macro should look like. Do you need the usual macro symbology like backticks and splicing or not? If so how? This can be difficult to decipher if you are not fluent in reading macros. The single example in the source code is making things worse because it does include backticks and splicing.

FiveAM defines a macro def-fixture which allows you write your fixtures just like normal functions with the one exception that there is an implicit variable &body to represent your test code. No fiddling with complex macros!

This is a simple example:

(def-fixture in-test-environment ()
  "Set up and tear down the test environment."
  (setup-code)
  (&body)
  (teardown-code))

(def-test a-test ()
  "Test in clean environment."
  (with-fixture in-test-environment ()
    (is-true (some-function))))

The fixture implementation provides an easy-to-use definition syntax without any additional processing. If you need more complex macros than what def-fixture can handle you can write normal Lisp macros as usual without interfering with FiveAM's operation.

  1. Some frameworks can apply fixtures to the test suite (as opposed to a test) so that it executes only once before any test in the suite is run and once after all tests have completed, regardless of how many tests in the suite are actually executed. FiveAM does not have this capability.

Daniel VedderMy Journey to the Ants

· 57 days ago

Ants have always been some of my favourite animals. As a child, I spent hours watching their work and their wars - watching and wondering. In fact, I know few biologists who are not fascinated by ants. I simply had the good fortune of growing up in a place where I had plenty of ant life to see. Having been back in Zambia for a few months, I was able to continue my observations. Here is a “best-of” reel: a small glimpse into the world of African ants.

Driver ants

If you're thinking of the most feared animals of Africa, you're probably not thinking about ants. These ants, however, make that category. Next to snakes, scorpions, and crocodiles, they are probably the animals people give the widest berth.

Driver ants (genus Dorylus) occur throughout the African tropics. They are big, red, and phenomenally aggressive. And they come in huge swarms.

In Zambia, they are usually seen during the rainy season. As nomads, they have no permanent nest, but the whole colony shifts about from one temporary nest to the next. When that happens, they move out in dense trails, guarded by their huge soldiers. Often there will be several trails running more or less in parallel, and sometimes these trails are actually roofed over by interlocking ants. (Driver ants are known to build bridges and even floating rafts using this technique of interlocking workers.)

These trails are not something to disturb - because when you do, the ants get angry and start to run all over the place. So if they happen to be moving through your house, the accepted procedure is just to let them pass. And get out of the house yourself, if you have to…

Sometimes, they don't move in trails, but in swarms. This is the foraging technique that gave them their name, because as they advance - thousands of ants swarming in one direction over a breadth of five meters or more - they drive away anything in their path. Whatever can't flee is eaten: grasshoppers, earthworms, even small mammals. (Pet rabbits often fall victim to driver ants if they are kept in cages or enclosures. The ants get into the fur, and then all bite at once. The shock is enough to kill the rabbit.)

Driver ants

A trail of driver ants

Driver ant soldier

A driver ant soldier standing guard beside the trail.

Matabele ants

Matabele ants (Megaponera analis) are my personal favourite. Named after a warlike 19th century African tribe, they specialise on raiding termites. Three times a day, a Matabele nest will send out a detachment of 200-300 workers and soldiers. Running quickly, the large black ants head straight towards a nearby termite nest. (Their raiding radius is approximately 15m, the location of the target is scouted out before the raid.) Once there, they dash inside, grab a termite each and return home before the termites have time to mount a proper defense.

Recently, researchers from my university discovered that Matabele ants actually employ “battlefield medics”. When an ant is injured during a raid (for example by having a leg bitten off), other ants will come to its rescue, carrying it back to base to recover. In this way, a nest significantly reduces its losses and is able to keep up a larger fighting force. (Frank et al. 2017)

Matabele ants

Matabele ants returning from a termite raid.

(Matabele ants are not the only ants to prey on termites. Once I found a small termite mound that had been kicked in by a human. Over two days, the damaged nest was besieged by ants of two or three different species. The termite soldiers fought a gallant rear-guard action: blocking the tunnel exits with their large heads, they held off the assailants as the smaller termite workers walled up the tunnel behind them.)

Flying ants

Every year, the start of the rainy season marks the start of a new generation of ants. As the first rains fall in October/November, the ground opens up and thousands of flying ants take to the air.

As eusocial insects, most ant individuals never reproduce, having been born as sterile workers and living only to serve the colony. They gather food, protect the nest, and tend to the larvae. These grow from eggs laid by a queen ant - in the strict order of the nest, she is the one responsible for reproduction. (In some species, a nest will have more than one queen.) But though most larvae grow up to become more female workers, some become queens, or males. And once a year, it is time to mate.

These nuptial flights are impressive events. Over the course of one afternoon and night, all the nests of one species in an area pour forth their offspring. The males and females fly up and away to mate, hundreds of thousands of them. When they land, they shed or bite off their wings and search for a place for a new colony. In the morning, the ground will be covered discarded wings.

Of course, a mass event like this will not go unnoticed by predators. Flocks of hundreds of martins and swifts circle over the area, feasting on a Christmas come early. And humans profit from it too: fried, the large drones of the driver ants (known locally as “inswa”) apparently make quite a tasty snack…

Flying ants and birds

Two flying ants flying up towards a flock of swifts

Fighting ants

As has been seen above, ants can be ferocious fighters. They are quite human in that sense: they will not only attack other insects, but also each other. Recently I witnessed a pitched battle between two ant colonies of different species that showcased this quite strikingly.

Unfortunately, I have been unable to identify either of the combatant species due to a lack of appropriate literature and a good camera. However, I know them quite well by sight. (EDIT: The larger species is probably Myrmicaria natalensis.)

The first is a medium large species, with only one caste of workers that are approximately 10mm long. They are brown-black and do not run in trails, but forage singly or in loose groups around a large, ground-level nest entrance. They are not very aggressive.

The second is a lot smaller - about 2mm. They have two worker castes, a major (also called a soldier) and a minor. They are so small you rarely see them, unless you happen to be looking very closely. Usually they restrict themselves to the close proximity of their nests. These, however, are fiercely defended against all comers, no matter what their size.

So walking along a path one day, I noticed three tightly-packed agglomerations of the first species in as many meters. This struck me as strange, because it does not fit their normal behaviour as described above. Looking closer, I realized the large species was not alone. Beside each of their agglomerations was a patch of ground controlled by the second, small species. In fact, it looked a lot like a frontline - one side held by the large species, the other by the smaller one:

Frontline between two ant armies

One on one, the small ants were of course no match for their counterparts. But that didn't stop them attacking. They would single out one opponent and attack it in a group, until their target backed off or was helped by a nestmate. The strategy seemed to work: I could see the corpses of several large ants that had foolishly strayed too deep into enemy territory lying about. The rest decided to sit tight and hold the line. For about half an hour, the small ants kept up their attacks, or at least kept threatening towards the others. But eventually the skirmishes got fewer and farther between and a stalemate developed. At this point I left the scene. When I came back the next morning, there were no ants to be seen.

What caused this battle? I honestly don't know. My best guess is that while foraging, the large ants trespassed on the territory of the small ones. These attacked, and as the beleaguered foragers called for help from their nestmates, the agglomerations and the “frontline” formed. Whatever the cause, the fighting got quite severe in places. Here is the picture of a large ant, the severed head of a small warrior still bitten fast to its leg:

Ant casualty

That's war for you. Did I mention ants could be a lot like humans?

- - -

P.S. If you found this interesting, I can only recommend the book Journey to the Ants by Bert Hölldobler and E.O. Wilson. (That's where the title of this post was taken from.) It is a fascinating and personal insight into the biology of the ants, written by the two foremost experts in the field.

Daniel VedderRezension: Mein Leben für die Natur

· 60 days ago

Josef H. Reichholfs “Mein Leben für die Natur - auf den Spuren von Ökologie und Evolution” ist eine genauso faszinierende wie anregende und streitbare Lektüre. Das 2015 erschienene Buch bietet Auszüge aus einem halben Jahrhundert Naturbeobachtungen, aufbereitet wie eine Doku und gekonnt verwoben mit ökologischen Überlegungen und gesellschaftlichen Kommentaren.

Ein Buch über die Natur…

Das Werk ist in fünf große Abschnitte gegliedert, die sich jeweils an einer biogeographischen Region orientieren. Den Anfang macht sein heimisches Bayern, mit Geschichten aus seiner Kindheit und Studienzeit. Daraufhin wechselt er nach Südamerika, in dem er nach seiner Promotion ein Jahr verbrachte. Anschließend geht es über den Atlantik nach Ostafrika, das er ebenfalls bereiste (wenn auch nicht so ausführlich). Als viertes kommt ein Abschnitt über Inseln, bevor er zum Abschluss nach Deutschland zurückkehrt.

Reichholfs Naturbeschreibungen sind ganz große Klasse. Er versteht es, so lebhaft zu schreiben, dass man fast meint, eine Filmdokumentation zu schauen. Seine Geschichten können faszinierend, amüsant und berührend sein - etwa wenn er beschreibt, wie Tausende Schmetterlinge von südlich der Sahara bis nach Deutschland wandern; oder wenn er von der hautnahe Begegnung mit einer Walkuh erzählt.

Außerdem ermöglicht es ihm seine weitreichende biologische Fachkenntnis, nicht nur Beobachtungen wiederzugeben, sondern auch deren Bedeutung auszuführen. Beispielsweise zeigt er auf, dass die Wasservögel der Galapagos-Inseln in Verhalten und Physiologie zwar sehr verschieden sind, aber doch alle auf ihre Art an die besonderen Bedingungen dieser Inselgruppe angepasst sind.

Eine weiter Sache, die ich sehr an diesem Buch schätze, ist dass Reichholf in großen Zügen denkt. Nicht nur, dass er einzelne Tiere stets im Kontext ihres Ökosystems sieht, sondern auch, dass er diese Ökosysteme in den globalen Zusammenhang stellt. So verbringt er einen längeren Abschnitt damit, über die frappanten Unterschiede der südamerikanischen und afrikanischen Megafauna nachzudenken. Denn obwohl beide Kontinente eigentlich sehr ähnliche Habitate bieten, stet doch die Artenarmut der großen südamerikanischen Säugetiere in keinem Verhältnis zur Fülle und Vielfalt ihrer afrikanischen Verwandten. Die Erklärungsvorschläge, die der Autor hierfür liefert, sind bestechend und zeugen von einem großen Denker und Ökologen. (Unter anderem weist er darauf hin, dass der Erdboden des Amazonasbeckens - im Gegensatz zu dem seines Pendants, des Kongobeckens - äußerst nährstoffarm ist. Der ungeheure Pflanzenreichtum des dortigen Regenwalds ist nur durch ein äußerst effizientes Recyclingsystem möglich, das kaum Energie in die höheren trophischen Ebenen durchsickern lässt. Erstaunlicherweise wird der tatsächlich stattfindende, unvermeidliche Schwund an Nährstoffen von außerhalb ausgeglichen - durch Saharasand, der über den Atlantik geweht wird!)

…und unseren Umgang mit ihr

Aber Reichholf bleibt nicht bei fernen Beispielen stehen. Im letzten Abschnitt des Buchs wird es persönlich: Nun betrachtet er unsere heimische Natur und wie sich die gesellschaftlichen Entwicklungen der letzten Jahrzehnte auf diese ausgewirkt haben. Er redet über Änderungen in der Landwirtschaft, Fortschritte bei der Abwasserklärung und die Ausbreitung fremder Arten. Schonungslos und meinungsstark erzählt er, was er in jahrelanger Forschungs- und Naturschutzarbeit erlebt und entdeckt hat.

Hier nimmt er kein Blatt vor den Mund. Offen kritisiert er Gruppierungen und Verbände, die er als nicht hilfreich oder schädlich für den Naturschutz sieht. Insbesondere Landwirte, Angler und Jäger bekommen seinen Zorn zu spüren. Sie seien von vielen Naturschutzbestimmungen ausgenommen, obwohl sie mit die schlimmsten Schäden anrichteten. Die wahren Naturfreunde hingegen müssten sich sinnlosen Regeln unterwerfen (wie das Verbot des Schmetterlingfangs), die die Forschung genauso erschweren wie das simple Genießen und Erkunden der Natur.

Überhaupt nimmt er die Naturschutzbewegung in Deutschland in die Pflicht. Als Naturschützer der ersten Stunde begrüßt er zwar die gesellschaftliche Annahme des Naturschutzgedankens, kritisiert jedoch manche seiner Ausgeburten. So bemängelt er, dass sich politischer Naturschutz oft kaum bis gar nicht an tatsächlicher wissenschaftlicher Erkenntnis orientiere; und selbst Naturschutzverbände verträten viel zu häufig keine echte Ökologie mehr, sondern einen zur Religion verkommenen Ökologismus. Außerdem würden die apokalyptischen Warnungen vor dem unaufhaltbaren globalen Klimawandel ablenken von konkreten, zielführenden Naturschutzmaßnahmen vor Ort.

Auch wissenschaftlich hat er ein Hühnchen zu rupfen. Seiner Meinung nach wird die Ökologie als viel zu statisch gesehen - Natur ist immer in Bewegung, und deswegen sei längst nicht jede Änderung (speziell die Ausbreitung von Arten) als negativ zu bewerten. Auch hält er nicht viel von Modellrechnungen: Naturgeschichte sei eben vor allem auch Geschichte und als solche grundsätzlich nicht längerfristig vorhersagbar.

Positiv wird er erst wieder, wenn er darüber spricht, warum er eigentlich Naturschutz betreibt - weil die Natur schön ist. Das Staunen über und die Freude an der Natur ist der rote Faden des gesamten Buchs. Im letzten Kapitel redet er noch einmal offen über diese Liebe zur Natur und plädiert leidenschaftlich dafür, die Schönheit der Natur als ganz persönliche raison d'etre des Naturschutzes nicht zu vernachlässigen.

Meine Reaktion

Ich bin von diesem Buch stark beeindruckt. Reichholf ist ein Biologe der alten Schule, ein Naturkundler ganz im Sinne Alexander von Humboldts und E.O. Wilsons (meine zwei größten Wissenschaftsvorbilder). Die Ökologie, die ich bislang an der Uni kennengelernt habe, ist in ihrer Vorgehensweise stark physikalisch geprägt: Aufgrund einer ökologischen Theorie stellt man eine Hypothese auf, denkt sich ein passendes (Freiland-)Experiment aus und veröffentlicht am Schluss eine möglichst aussagekräftige Statistik. Ein Naturkundler wie Reichholf macht es umgekehrt. Anstatt mit der Theorie zu beginnen und mit der Beobachtung zu enden, geht er einfach hinaus in die Natur und schaut sie sich an. Allein dieses Beobachten führt zu Fragen, die durch die Verknüpfung mit weiten Beobachtungen beantwortet werden können - oder auch nicht. Diese Herangehensweise zählt in der universitären Forschung oft als veraltet, da sie angeblich eine rein “observational science” darstellt (im Gegensatz zur angestrebten “predictive science”). Aber warum sich dessen schämen, solange es noch so viel Neues zu entdecken gibt? Reichholf sagt es selbst sehr schön:

Hauptzweck ist das Persönliche, das Erleben von Natur. Für mich war und ist dies das Wichtigste. Wer sich versenkt hat in die Vielfalt der Natur, wird zum Entdecker, der nicht mehr aufhören kann. Unablässig gibt es Neues, Spannendes, Überraschendes. Je mehr man weiß, desto mehr will man wissen und kennenlernen...

Fest steht, dass ich bei der Lektüre dieses Buchs fast so viel über die Natur gelernt habe wie in drei Jahren Bachelorstudium. Selbstverständlich kann ein Biologiestudium nicht nur aus Tropenexkursionen bestehen - aber hätte es gar so viel Chemie und Molekularbiologie sein müssen? Und ich weiß, dass ich nicht der einzige Student meines Jahrgangs bin, der kaum mehr als zwanzig einheimische Vogelarten kennt. (Ich habe wenigstens noch die Ausrede, dass ich nicht in Deutschland aufgewachsen bin…) Ich hoffe, dass ich im Master die Gelegenheit haben werde, meine Artenkenntnis noch einmal deutlich zu verbessern. Aber wenn selbst unsere Biologiestudenten unsere Natur kaum kennen, wie können wir das dann von der restlichen Gesellschaft erwarten? Dabei gilt nach wie vor: Was man nicht kennt, kann man nicht schätzen. Und wer will schon schützen, was er nicht schätzt?

Reichholf hat Recht. Naturschutz muss emotionaler werden. Ich freue mich über jeden Wissenschaftler, der bereit ist, sich im öffentlichen Diskurs außerhalb des Elfenbeinturms “die Finger dreckig zu machen”. Noch mehr freue ich mich über jeden Wissenschaftler, der dazu noch die Gabe hat, sein Wissen verständlich zu vermitteln und etwas von seiner ureigenen Neugierde und Faszination für die Natur weiterzugeben. Mein Leben für die Natur hat das geschafft.

Fazit

Reichholfs “Opus magnum” (Zitat vom Klappentext) ist ein sehr, sehr lesenswertes Buch. Gerade für meine Kommilitonen, die sich auch auf die Ökologie spezialisieren, sollte es fast schon Bestandteil des Studiums sein. Insbesondere die von ihm entwickelten ökologischen Hypothesen haben mir hoffentlich etwas besser beigebracht, was es heißt, ökologisch zu denken.

Es ist kein leicht lesbares Buch. So gut es auch geschrieben ist - es ist lang (> 600 Seiten) und jeder Abschnitt will verdaut werden. Deswegen habe ich für die Lektüre auch drei Monate gebraucht. Aber es hat sich gelohnt.

Es sollte auch kritisch gelesen werden. Seine Erklärungen hören sich für mich schlüssig an, aber oftmals fehlt mir das Fachwissen, seine etwas spekulativeren Ausführungen zu beurteilen. Vor allem beim Thema Naturschutz wird er oft sehr zynisch - ich hoffe inständig, dass sein Pessimissmus in diesem Bereich nicht immer gerechtfertigt ist. Und schließlich ist Reichholf durchaus für seine kontroversen Thesen bekannt, etwa wenn es um den Klimaschutz geht. Trotzdem sollte man man seine Kritik nicht leichtfertig abstreifen. Dafür hat er zu viel gesehen, zu viel gelernt und zu viel getan in den vergangenen sechs Jahrzehnten.

Schließen möchte ich mit den lyrischen Worten seines Resümees:

Freude an der Natur zu vermitteln hielt und halte ich für meine wichtigste Aufgabe. [...] Das Schlimme ist ja, dass niemand den Gesang der Lerchen frühmorgens über den Fluren vermisst, der ihn nicht mehr erlebt hat. Oder die bunten Blumen auf den Wiesen und die Schmetterlinge darüber. Allzu rasch und leichtfertig wird die Erinnerung daran als romantische Naturschwärmerei abgetan. Dabei wäre es doch nötig, dass viel mehr geschwärmt würde von der Natur. Dann stünde es besser um sie.

Daniel VedderMooBreeder - a simple breeding game

· 61 days ago

Recently I have been doing some teaching at a secondary school. Accordingly, I have been on a constant look-out for ways to make my lessons more engaging. So when the topic of selective breeding came up in one of my Key Stage 3 biology classes, I thought I might be able to turn the whole thing into a game.

Unfortunately, from the moment I had the idea to create a simple computer game for the lesson, I only had four days of preparation left. But I enjoy a challenge, and I wasn't planning to do any fancy graphics - so off I went. I christened the project MooBreeder, and, within three days, I actually had it completed. Here is a screenshot:

MooBreeder

Like I said, it is very simple. But it does bring across the salient points of the lesson, which were:

  • Not all organisms of one species are the same.

  • If we keep breeding the best animals for a given task and discarding the rest, we will eventually get animals that are better and better suited for this task.

  • Breeding for one feature may lead to another feature becoming ‘worse’.

At the start of the game, students choose one purpose they want to breed for (“dairy”, “beef”, or “plowing”). Each cow or bull has three traits that correlate with its suitability for this purpose (milk, meat, strength). Each animal also has an aggression trait, which strongly decreases its suitability for any purpose. The players must repeatedly choose with bull to mate with which cow, and which of their offspring to keep. The aim is to breed an animal that is “excellent” for the chosen purpose in as few generations as possible (although there is a strong element of chance involved).

MooBreeder is distributed as source and as an executable JAR. So if you would like to use it for a lesson, or just play around with it, go ahead! And shoot me an email to let me know what you think of it.

Daniel VedderBlogging with Lisp

· 62 days ago

So, I finally have my blog up and running. After several years of thinking: “I really ought to write an article on that thought.” 1 And two afternoons of trouble-shooting Linux, Lisp, and LAMP problems.

Because of course, if I was going to write a blog I would host it myself. And Wordpress is overkill for what I am planning, all I need is a simple static site generator working from markdown. How hard can it be?

Well, it started with the choice of generators. I'd read about two or three good options before - trouble was, I couldn't remember their names. And DuckDuckGo came up with a list of 458 of them. Eventually I settled on coleslaw, mainly because it is written in Common Lisp and looks fairly well maintained. I've been doing a lot with Lisp again lately and have been trying to get deeper into the Common Lisp ecosystem - so this seemed like a good place to go.

So I headed over to my server, brought up a REPL and installed the Quicklisp package. So far so good. But then the real difficulties started. I also host a Gitbucket instance (see the project homepage) for my private source code. According to the coleslaw website, I should have been able to stick a small shell script into my server repo as a git hook, so my blog is automatically updated whenever I do a git push. It didn't work. After trying all sorts of fixes, I eventually discovered an obscure thread that told me that Gitbucket doesn't support git hooks at all. Well, that sucks. Especially since the first afternoon (and part of the evening) was now gone.

Well, if I can't automate deployment on the server, perhaps I could automate it locally. With this intent I started on the second afternoons work. And after various pitfalls, I finally got it working. (Pro tip: Keeping your work files on an NTFS partition is great if you occasionally need to access them from Windows. But its lack of user permissions plays merry hell when you have to interface with ext4, or programs that expect to be running on ext4.2) Anyway, I now have a setup where I have all my blog pages and posts stored as markdown documents in a git repo (created locally and hosted on my server), alongside the theming templates used by coleslaw. Plus I have a local post-commit hook that will publish my changes as soon as I commit them 3. So in the end, that's pretty good going. Just took a long, winding road to get here…

Summary coleslaw is, on the whole, a really neat little program. If you're used to working with Common Lisp and Quicklisp, I can recommend it. The documentation is a bit on the minimalist side though, and there aren't a lot of tutorials available. So prepare for a bit of trial and error, and don't forget to Read The Source. But it serves my purposes well and I'm quite pleased with the result.

- - -

1) I miss the days of CATO

2) coleslaw uses rsync -a to move files around. This attempts to preserve file permissions - which obviously causes problems with NTFS. To work around this, your theme templates, your static directory, and your staging directory need to be on ext4. I do this by copying the first two to the staging directory via a shell script (see below). The Github ticket requesting a fix is here.

3) This is my .git/hooks/post-commit file:

echo "Copying theme..."
cp -a ~/TerraNostra/my-readable-theme/* \
    ~/.coleslaw/themes/my-readable
chmod -R 755 ~/.coleslaw/themes/my-readable

echo "Copying static content..."
cp -a ~/TerraNostra/static-content/* ~/.coleslaw/static/
chmod -R 755 ~/.coleslaw/static

echo "Building HTML..."
sbcl --noinform \
    --eval "(ql:quickload :coleslaw)" \
    --eval "(coleslaw:main \".\")" \
    --eval "(quit)"

echo "Done."

Eugene ZaikonnikovSome documents on AM and EURISKO

· 80 days ago

Sharing here a small collection of documents by Douglas B. Lenat related to design AM and EURISKO that I assembled over the years. These are among the most famous programs of symbolic AI era. They represent so-called 'discovery systems'. Unlike expert systems, they run loosely-constrained heuristic search in a complex problem domain.

AM was Lenat's doctoral thesis and the first attempt of such kind. Unfortunately, it's all described in rather informal pseudocode, a decision that led to a number of misunderstandings in follow-up criticism. Lenat has responded to that in one of the better known publications, Why AM and EURISKO appear to work.

AM was built around concept formation process utilizing a set of pre-defined heuristics. EURISKO takes it a step further, adding the mechanism of running discovery search on its own heuristics. Both are specimen of what we could call 'Lisp-complete' programs: designs that require Lisp or its hypothetical, similarly metacircular equivalent to function. Their style was idiomatic to INTERLISP of 1970s, making heavy use of FEXPRs and self-modification of code.

There's quite a lot of thorough analysis available in three-part The Nature of Heuristics: part one, part two. The third part contains the most insights into the workings of EURISKO. Remarkable quote of when EURISKO discovered Lisp atoms, reflecting it was written before the two decade pause in nuclear annihilation threat:

Next, EURISKO analyzed the differences between EQ and EQUAL. Specifically, it defined the set of structures which can be EQUAL but not EQ, and then defined the complement of that set. This turned out to be the concept we refer to as LISP atoms. In analogy to humankind, once EURISKO discovered atoms it was able to destroy its environment (by clobbering CDR of atoms), and once that capability existed it was hard to prevent it from happening.

Lenat's eventual conclusion from all this was that "common sense" is necessary to drive autonomous heuristic search, and that a critical mass of knowledge is necessary. That's where his current CYC project started off in early 1990s.

Bonus material: The Elements of Artificial Intelligence Using Common Lisp by Steven L. Tanimoto describes a basic AM clone, Pythagoras.

CL Test Gridquicklisp 2018-10-18

· 86 days ago

Quicklisp newsOctober 2018 Quicklisp dist update now available

· 94 days ago
New projects:
  • arrows — Implements -> and ->> from Clojure, as well as several expansions on the idea. — CC0
  • authenticated-encryption — Authenticated-Encryption functions — MIT
  • base64 — Base64 encoding and decoding for Common Lisp. — Apache 2.0
  • black-tie — Noise library for Common Lisp. — BSD
  • cl-clblas — clBLAS binding — Apache License, Version 2.0
  • cl-dotenv — Utility library for loading .env files — MIT
  • cl-fuzz — A Fuzz Testing Framework — BSD-2
  • cl-las — Library to manipulate LAS files — ISC
  • cl-proj — CL-PROJ provides Proj.4 library bindings — BSD
  • cl-prolog2 — Common Interface to the ISO prolog implementations from Common Lisp — MIT
  • cover — Code coverage utility for Common Lisp — MIT
  • destructuring-bind-star — DESTRUCTURING-BIND with proper error signaling — MIT
  • everblocking-stream — A stream that always blocks and never has data available. — Public domain
  • heap — Binary Heap for Common Lisp. — Apache 2.0
  • huffman — Huffman encoding and decoding for Common Lisp. — Apache 2.0
  • lazy — Lazy forms for Common Lisp. — Apache 2.0
  • lorem-ipsum — Lorem ipsum generator in portable Common Lisp — MIT
  • parse — Parsing package for Common Lisp. — Apache 2.0
  • print-html — Simple html generator. — MIT License
  • protest — Common Lisp PROtocol and TESTcase Manager — LLGPL
  • re — Lua-style string pattern matching. — Apache 2.0
  • regular-type-expression — This project contains several Common Lisp packages — MIT
  • safe-read — A variant of READ secure against internbombing, excessive input and macro characters. — BSD 2-clause
  • safety-params — Filter parameters — BSD 2-Clause
  • sc-extensions — additional library collection for cl-collider — Public Domain / 0-clause MIT
  • sha1 — SHA1 Digest and HMAC for LispWorks. — Apache 2.0
  • sycamore — A fast, purely functional data structure library — BSD-3
  • targa — Targa Image Loading for Common Lisp. — Apache 2.0
  • trivial-cltl2 — Compatibility package exporting CLtL2 functionality — LLGPL
Updated projectsarray-utilsasdf-vizassoc-utilsbinary-iobit-smashercari3sceplcl+sslcl-anacl-cffi-gtkcl-collidercl-colors2cl-i18ncl-kanrencl-ledgercl-liballegrocl-mecabcl-mixedcl-neovimcl-notebookcl-patternscl-plumbingcl-portmanteaucl-postgres-plus-uuidcl-progress-barcl-pslibcl-pslib-barcodecl-pythoncl-rabbitcl-sdl2cl-sdl2-imagecl-sdl2-mixercl-sdl2-ttfclackcloser-mopclosure-commonclunit2clxcodexcolleencommonqtcroatoancxmlcxml-stpdataflydefinitionsdexadordjuladmldo-urlencodedufydynamic-mixinseasy-audioeclectorfemlispfunction-cachefxmlgamebox-mathgeowktgolden-utilsharmonyiclendarinquisitorintegralironcladlacklasslichat-tcp-serverlog4clmaidenmcclimmitomito-attachmentmywayninevehningleoverlordpango-markupparachuteparser.iniperlrepetalispplace-utilsplexippus-xpathplump-sexppostmodernpreplprint-licensesqlotqtoolsquriread-csvroves-dot2scalplselserapeumshadowshuffletronslysplit-sequencest-jsonstaplestmxstumpwmsxqltime-intervaltootertrace-dbtrack-besttriviatrivial-benchmarktrivial-garbagetrivial-gray-streamstrivial-indenttrivial-utilitiesubiquitousutmvarjovernacularwoowookie.

Removed projects: clot, clpmr, cobstor, html-sugar, ie3fp, manardb, metafs, mime4cl, net4cl, npg, ods4cl, plain-odbc, quid-pro-quo, sanitized-params, sclf, smtp4cl, tiff4cl.

The removed projects no longer work on SBCL.

To get this update, use (ql:update-dist "quicklisp"). Enjoy!

Vsevolod DyomkinANN: flight-recorder - a robust REPL logging facility

· 113 days ago

Interactivity is a principal requirement for a usable programming environment. Interactivity means that there should be a shell/console/REPL or other similar text-based command environment. And a principal requirement for such an environment is keeping history. And not just keeping it, but doing it robustly:

  • recording history from concurrently running sessions
  • keeping unlimited history
  • identifying the time of the record and its context

This allows to freely experiment and reproduce the results of successful experiments, as well as go back to an arbitrary point in time and take another direction of your work, as well as keeping DRY while performing common repetitive tasks in the REPL (e.g. initialization of an environment or context).

flight-recorder or frlog (when you need a distinct name) is a small tool that intends to support all these requirements. It grew out of a frustration with how history is kept in SLIME, so it was primarily built to support this environment, but it can be easily utilized for other shells that don't have good enough history facility. It is possible due to its reliance on the most common and accessible data-interchange facility: text-based HTTP.

frlog is a backend service that supports any client that is able to send an HTTP request.

The backend is a Common Lisp script that can be run in the following manner (probably, the best way to do it is inside screen):


sbcl --noprint --load hunch.lisp -- -port 7654 -script flight-recorder.lisp

If will print a bunch of messages that should end with the following line (modulo timestamp):


[2018-09-29 16:00:53 [INFO]] Started hunch acceptor at port: 7654.

The service appends each incoming request to the text file in markdown format: ~/.frlog.md.

The API is just a single endpoint - /frlog that accepts GET and POST requests. The parameters are:

  • text is the content (url-encoded, for sure) of the record that can, alternatively, be sent in the POST request's body (more robust)

Optional query parameters are:

  • title - used to specify that this is a new record: for console-based interactions, usually, there's a command and zero or more results - a command starts the record (and thus should be accompanied with the title: for SLIME interactions it's the current Lisp package and a serial number). An entitled record is added in the following manner:

    ### cl-user (10) 2018-09-29_15:49:17

    (uiop:pathname-directory-pathname )
    If there's no title, the text is added like this:

    ;;; 2018-09-29_15:49:29

    #<program-error @ #x100074bfd72>
  • tag - if provided it signals that the record should be made not to a standard .frlog.md file, but to .frlog-<tag>.md. This allows to easily log a specific group of interactions separately If the response code is 200 everything's fine.

Currently, 2 clients are available:

  • a SLIME client flight-recorder.el that monkey-patches a couple of basic SLIME functions (just load it from Emacs if you have SLIME initialized)
  • and a tiny Lisp client frlog.lisp

P.S. To sum up, more and more I've grown to appreciate simple (sometimes even primitive - the primitive the better :) tools. flight-recorder seems to me to be just like that: it was very easy to hack together, but it solves an important problem for me and, I guess, for many. And it's the modern "Unix way": small independent daemons, text-based formats and HTTP instead of pipes...

P.P.S. frlog uses another tiny tool of mine - hunch that I've already utilized in a number of projects but haven't described yet - it's a topic for another post. In short, it is a script to streamline running hunchentoot that does all the basic setup and reduces the developer's work to just defining the HTTP endpoints.

P.P.P.S. I know, the name is, probably, taken and it's a rather obvious one. But I think it just doesn't matter in this case... :)

Quicklisp newsAugust 2018 Quicklisp dist update now available

· 143 days ago
New projects:
  • cari3s — A generator for the i3 status bar. — Artistic
  • cl-all — A script to evaluate expressions in multiple lisp implementations. — Artistic
  • cl-collider — A SuperCollider client for CommonLisp — Public Domain
  • cl-environments — Implements the CLTL2 environment access functionality for implementations which do not provide the functionality to the programmer. — MIT
  • cl-ledger — Double-entry accounting system. — BSD-3
  • cl-mecab — Interface of MeCab that is a morpheme analyzer — LLGPL
  • cl-swagger-codegen — lisp code generator for swagger — 2-clause BSD
  • cmake-parser — A cmake script parser. — MIT
  • dartscluuid — Provides support for UUIDs as proper values — MIT
  • database-migrations — System to version the database in roughly the same way rails migrations work. Differences are that only one database is really supported (but hacking around that is trivial) and that migrations are not needed to be stored in separate files. — MIT
  • float-features — A portability library for IEEE float features not covered by the CL standard. — Artistic
  • iclendar — An iCalendar format lirbary. — Artistic
  • illusion — Customize and manage Lisp parens reader — MIT
  • lisp-binary — Declare binary formats as structs and then read and write them. — GPLv3
  • mmap — Portable mmap (file memory mapping) utility library. — Artistic
  • pango-markup — A small library to generate pango-style text markup. — Artistic
  • petalisp — Elegant High Performance Computing — AGPLv3
  • pludeck — A friendly interface for creating a Plump DOM. — MIT
  • print-licenses — Print the licenses used by the given project and its dependencies. — MIT
  • sly — Sylvester the Cat's Common Lisp IDE — Public Domain
  • sprint-stars — Display the stars of a GitHub User — GPL 3
  • studio-client — A client library for the Studio image hosting service — Artistic
  • test-utils — Convenience functions and macros for testing Common Lisp applications via Prove and Quickcheck — MIT Expat
  • trivial-utilities — A collection of useful functions and macros. — MIT
  • vernacular — Module system for language embeddings. — MIT
Updated projects3d-matrices3d-vectorsa-cl-loggeracclimationahungry-fleecealexaalgebraic-data-libraryaprilarc-compatarray-utilsbodge-sndfilecavemanceplcepl.drm-gbmchirpcl+sslcl-anacl-bnfcl-bootstrapcl-colors2cl-conllucl-csvcl-dbicl-feedparsercl-flaccl-flowcl-fondcl-gamepadcl-gendoccl-generatorcl-gpiocl-gracecl-i18ncl-k8055cl-kanrencl-libuvcl-mixedcl-monitorscl-mpg123cl-oclapicl-openglcl-out123cl-patternscl-ppcrecl-progress-barcl-projectcl-pslibcl-pslib-barcodecl-sdl2-ttfcl-soilcl-soloudcl-spidevcl-strcl-virtualboxcl-waylandcl-yesqlclackclawclipclmlcloser-mopclssclunit2colleenconfiguration.optionsconiumcroatoancrypto-shortcutscurry-compose-reader-macrosdartsclhashtreedeedsdeferreddefinitionsdelta-debugdeploydexadordissectdmldocumentation-utilsdufyeazy-gnuploteazy-projecteclectorfare-scriptsfast-httpflareflowfluteforform-fiddlefxmlglsl-specglsl-toolkitgolden-utilsgsllhalftoneharmonyhumblerironcladjsonrpckenzolacklambda-fiddlelasslegitlichat-protocollichat-serverliblichat-tcp-clientlichat-tcp-serverlichat-ws-serverlionchatlisp-executablelispbuilderlistopialquerymaidenmcclimmitomodularizemodularize-hooksmodularize-interfacesmore-conditionsmultiposterneo4clnibblesninevehnorthoclclopticloverlordoxenfurtparachutepathname-utilsperlrepipingplumpplump-bundleplump-sexpplump-texpostmodernqlotqt-libsqtoolsqtools-uiracerrandom-stateratifyredirect-streamrfc2388rutilssanitized-paramsselserapeumshadowsimple-inferiorssimple-tasksslimesnoozesoftdrinksouthspinneretstaplestumpwmsxqlterminfothnappytootertrace-dbtrivial-argumentstrivial-batterytrivial-benchmarktrivial-clipboardtrivial-gray-streamstrivial-indenttrivial-main-threadtrivial-mimestrivial-thumbnailumbrausocketuuidvarjoverbosevgplotwhofieldswith-c-syntaxwoowookiexml.location.

Removed projects: cl-clblas, cl-proj.

There are no direct problems with cl-clblas and cl-proj. They are victims of a hard drive crash on my end, and an incomplete recovery. I have not been able to set up the foreign libraries required to build those projects in time for this month's release.

If you want to continue using cl-clblas and cl-proj, there are a few options:

  • Don't upgrade to the latest Quicklisp dist until they are back in
  • If you already upgraded, downgrade
  • Clone their repos into ~/quicklisp/local-projects
Sorry for any inconvenience this may cause. I hope to have it fully resolved in the September 2018 update.

To get this update, use (ql:update-dist "quicklisp").

Enjoy!

Zach BeaneA Road to Common Lisp - Steve Losh

· 147 days ago

Paul KhuongRestartable Sequences With the Polysemic Null Segment Selector

· 148 days ago

Implementing non-blocking algorithms is one of the few things that are easier to code in a kernel than in userspace (the only other one I can think of is physically wrecking a machine). In the kernel, we only have to worry about designing a protocol that achieves forward progress even if some threads stop participating. We must guarantee the absence of conceptual locks (we can’t have operations that must be paired, or barriers that everyone must reach), but are free to implement the protocol by naïvely spinlocking around individual steps: kernel code can temporarily disable preemption and interrupts to bound a critical section’s execution time.

Getting these non-blocking protocols right is still challenging, but the challenge is one fundamental for reliable systems. The same problems, solutions, and space/functionality trade-offs appear in all distributed systems. Some would even argue that the kind of interfaces that guarantee lock- or wait- freedom are closer to the object oriented ideals.

Of course, there is still a place for clever instruction sequences that avoid internal locks, for code that may be paused anywhere without freezing the whole system: interrupts can’t always be disabled, read operations should avoid writing to shared memory if they can, and a single atomic read-modify-write operation may be faster than locking. The key point for me is that this complexity is opt-in: we can choose to tackle it incrementally, as a performance problem rather than as a prerequisite for correctness.

We don’t have the same luxury in userspace. We can’t start by focusing on the fundamentals of a non-blocking algorithm, and only implement interruptable sequences where it makes sense. Userspace can’t disable preemption, so we must think about the minutiae of interruptable code sequences from the start; non-blocking algorithms in userspace are always in hard mode, where every step of the protocol might be paused at any instruction.

Specifically, the problem with non-blocking code in user space isn’t that threads or processes can be preempted at any point, but rather that the preemption can be observed. It’s a PCLSRing issue! Even Unices guarantee programmers won’t observe a thread in the middle of a syscall: when a thread (process) must be interrupted, any pending syscall either runs to completion, or returns with an error1. What we need is a similar guarantee for steps of our own non-blocking protocols2.

Hardware transactional memory kind of solves the problem (preemption aborts any pending transaction) but is a bit slow3, and needs a fallback mechanism. Other emulation schemes for PCLSRing userspace code divide the problem in two:

  1. Determining that another thread was preempted in the middle of a critical section.
  2. Guaranteeing that that other thread will not blindly resume executing its critical section (i.e., knowing that the thread knows that we know it’s been interrupted4).

The first part is relatively easy. For per-CPU data, it suffices to observe that we are running on a given CPU (e.g., core #4), and that another thread claims to own the same CPU’s (core #4’s) data. For global locks, we can instead spin for a while before entering a slow path that determines whether the holder has been preempted, by reading scheduling information in /proc.

The second part is harder. I have played with schemes that relied on signals, but was never satisfied: I found Linux perf will rarely, but not never, drop interrupts when I used it to “profile” context switches, and signaling when we determine that the holder has been pre-empted has memory visibility issues for per-CPU data5.

Until earlier this month, the best known solution on mainline Linux involved cross-modifying code! When a CPU executes a memory write instruction, that write is affected by the registers, virtual memory mappings, and the instruction’s bytes. Contemporary operating systems rarely let us halt and tweak another thread’s general purpose registers (Linux won’t let us self-ptrace, nor pause an individual thread). Virtual memory mappings are per-process, and can’t be modified from the outside. The only remaining angle is modifying the premptee’s machine code.

That’s what Facebook’s experimental library Rseq (restartable sequences) actually does.

I’m not happy with that solution either: while it “works,” it requires per-thread clones of each critical section, and makes us deal with cross-modifying code. I’m not comfortable with leaving code pages writable, and we also have to guarantee the pre-emptee’s writes are visible. For me, the only defensible implementation is to modify the code by mmap-ing pages in place, which incurs an IPI per modification. The total system overhead thus scales superlinearly with the number of CPUs.

With Mathieu Desnoyers’s, Paul Turner’s, and Andrew Hunter’s patch to add an rseq syscall to Linux 4.18, we finally have a decent answer. Rather than triggering special code when a thread detects that another thread has been pre-empted in the middle of a critical section, userspace can associate recovery code with the address range for each restartable critical section’s instructions. Whenever the kernel preempts a thread, it detects whether the interruptee is in such a restartable sequence, and, if so, redirects the instruction pointer to the associated recovery code. This essentially means that critical sections must be read-only except for the last instruction in the section, but that’s not too hard to satisfy. It also means that we incur recovery even when no one would have noticed, but the overhead should be marginal (there’s at most one recovery per timeslice), and we get a simpler programming model in return.

Earlier this year, I found another way to prevent critical sections from resuming normal execution after being preempted. It’s a total hack that exercises a state saving defect in Linux/x86-64, but I’m comfortable sharing it now that Rseq is in mainline: if anyone needs the functionality, they can update to 4.18, or backport the feature.

Here’s a riddle!

riddle.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
static const char values[] = { 'X', 'Y' };

static char
read_value(void)
{
        /*
         * New feature in GCC 6; inline asm would also works.
         * https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/Named-Address-Spaces.html#Named-Address-Spaces
         */
        return *(const __seg_gs char *)(uintptr_t)values;
}

int
main(int argc, char **argv)
{
        /* ... */
        char before_switch = read_value();  /* Returns 'X'. */
        usleep(1000 * 1000);  /* Or any other wait for preemption. */
        char after_switch = read_value();  /* Returns 'Y'. */
        /* ... */
}

With an appropriate setup, the read_value function above will return a different value once the executing thread is switched out. No, the kernel isn’t overwriting read-only data while we’re switched out. When I listed the set of inputs that affect a memory store or load instruction (general purpose registers, virtual memory mappings, and the instruction bytes), I left out one last x86 thing: segment registers.

Effective addresses on x86oids are about as feature rich as it gets: they sum a base address, a shifted index, a constant offset, and, optionally, a segment base. Today, we simply use segment bases to implement thread-local storage (each thread’s FS or GS offset points to its thread-local block), but that usage repurposes memory segmentation, an old 8086 feature... and x86-64 still maintains some backward compatibility with its 16-bit ancestor. There’s a lot of unused complexity there, so it’s plausible that we’ll find information leaks or otherwise flawed architectural state switching by poking around segment registers.

How to set that up

After learning about this trick to observe interrupts from userland, I decided to do a close reading of Linux’s task switching code on x86-64 and eventually found this interesting comment6.

Observing a value of 0 in the FS or GS registers can mean two things:

  1. Userspace explicitly wrote the null segment selector in there, and reset the segment base to 0.
  2. The kernel wrote a 0 in there before setting up the segment base directly, with WR{FS,GS}BASE or by writing to a model-specific register (MSR).

Hardware has to efficiently keep track of which is actually in effect. If userspace wrote a 0 in FS or GS, prefixing an instruction with that segment has no impact; if the MSR write is still active (and is non-zero), using that segment must impact effective address computation.

There’s no easy way to do the same in software. Even in ring 0, the only sure-fire way to distinguish between the two cases is to actually read the current segment base value, and that’s slow. Linux instead fast-paths the common case, where the segment register is 0 because the kernel is handling segment bases. It prioritises that use case so much that the code knowingly sacrifices correctness when userspace writes 0 in a segment register after asking the kernel to setup its segment base directly.

This incorrectness is acceptable because it only affects the thread that overwrites its segment register, and no one should go through that sequence of operations. Legacy code can still manipulate segment descriptor tables and address them in segment registers. However, being legacy code, it won’t use the modern syscall that directly manipulates the segment base. Modern code can let the kernel set the segment base without playing with descriptor tables, and has no reason to look at segment registers.

The only way to observe the buggy state saving is to go looking for it, with something like the code below (which uses GS because FS is already taken by glibc to implement thread-local storage).

h4x.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#define RUN_ME /*
gcc-6 -std=gnu99 $0 -o h4x && ./h4x; exit $?;

Should output
Reads: XXYX
Re-reads: XYX
*/
#define _GNU_SOURCE
#include <asm/prctl.h>
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>

static const char values[] = { 'X', 'Y' };

/* Directly set GS's base with a syscall. */
static void
set_gs_base(unsigned long base)
{
        int ret = syscall(__NR_arch_prctl, ARCH_SET_GS, base);
        assert(ret == 0);
}

/* Write a 0 in GS. */
static void
set_gs(unsigned short value)
{
        asm volatile("movw %0, %%gs" :: "r"(value) : "memory");
}

/* Read gs:values[0]. */
static char
read_value(void)
{
        /*
         * New feature in GCC 6; inline asm would also works.
         * https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/Named-Address-Spaces.html#Named-Address-Spaces
         */
        return *(const __seg_gs char *)(uintptr_t)values;
}

int
main(int argc, char **argv)
{
        char reads[4];
        char re_reads[3];

        reads[0] = read_value();
        reads[1] = (set_gs(0), read_value());
        reads[2] = (set_gs_base(1), read_value());
        reads[3] = (set_gs(0), read_value());

        printf("Reads: %.4s\n", reads);
        fflush(NULL);

        re_reads[0] = read_value();
        re_reads[1] = (usleep(1000 * 1000), read_value());
        re_reads[2] = (set_gs(0), read_value());

        printf("Re-reads: %.3s\n", re_reads);
        return 0;
}

Running the above on my Linux 4.14/x86-64 machine yields

$ gcc-6 -std=gnu99 h4x.c && ./a.out
Reads: XXYX
Re-reads: XYX

The first set of reads shows that:

  1. our program starts with no offset in GS (reads[0] == values[0])
  2. explicitly setting GS to 0 does not change that (reads[1] == values[0])
  3. changing the GS base to 1 with arch_prctl does work (reads[2] == values[1])
  4. resetting the GS selector to 0 resets the base (reads[3] == values[0]).

The second set of reads shows that:

  1. the reset base survives short syscalls (re_reads[0] == values[0])
  2. an actual context switch reverts the GS base to the arch_prctl value (re_reads[1] == values[1])
  3. writing a 0 in GS resets the base again (re_reads[2] == values[0]).

Cute hack, why is it useful?

The property demonstrated in the hack above is that, after our call to arch_prctl, we can write a 0 in GS with a regular instruction to temporarily reset the GS base to 0, and know it will revert to the arch_prctl offset again when the thread resumes execution, after being suspended.

We now have to ensure our restartable sequences are no-ops when the GS base is reset to the arch_prctl offset, and that the no-op is detected as such. For example, we could set the arch_prctl offset to something small, like 4 or 8 bytes, and make sure that any address we wish to mutate in a critical section is followed by 4 or 8 bytes of padding that can be detected as such. If a thread is switched out in the middle of a critical section, its GS base will be reset to 4 or 8 when the thread resumes execution; we must guarantee that this offset will make the critical section’s writes fail.

If a write is a compare-and-swap, we only have to make sure the padding’s value is unambiguously different from real data: reading the padding instead of the data will make compare-and-swap fail, and the old value will tell us that it failed because we read padding, which should only happen after the section is pre-empted. We can play similar tricks with fetch-and-add (e.g., real data is always even, while the padding is odd), or atomic bitwise operations (steal the sign bit).

If we’re willing to eat a signal after a context switch, we can set the arch_prctl offset to something very large, and take a segmentation fault after being re-scheduled. Another option is to set the arch_prctl offset to 1, and use a double-wide compare-and-swap (CMPXCHG16B), or turn on the AC (alignment check) bit in EFLAGS. After a context switch, our destination address will be misaligned, which will trigger a SIGBUS that we can handle.

The last two options aren’t great, but, if we make sure to regularly write a 0 in GS, signals should be triggered rarely, only when pre-emption happens between the last write to GS and a critical section. They also have the advantages of avoiding the need for padding, and making it trivial to detect when a restartable section was interrupted. Detection is crucial because it often isn’t safe to assume an operation failed when it succeeded (e.g., unwittingly succeeding at popping from a memory allocator’s freelist would leak memory). When a GS-prefixed instruction fails, we must be able to tell from the instruction’s result, and nothing else. We can’t just check if the segment base is still what we expect, after the fact: our thread could have been preempted right after the special GS-prefixed instruction, before our check.

Once we have restartable sections, we can use them to implement per-CPU data structures (instead of per-thread), or to let thread acquire locks and hold them until they are preempted: with restartable sections that only write if there was no preemption between the lock acquisition and the final store instruction, we can create a revocable lock abstraction and implement wait-free coöperation or flat-combining.

Unfortunately, our restartable sections will always be hard to debug: observing a thread’s state in a regular debugger like GDB will reset the GS base and abort the section. That’s not unique to the segment hack approach. Hardware transactional memory will abort critical sections when debugged, and there’s similar behaviour with the official rseq syscall. It’s hard enough to PCLSR userspace code; it would be even harder to PCLSR-except-when-the-interruption-is-for-debugging.

Who’s to blame?

The null GS hack sounds like it only works because of a pile of questionable design decisions. However, if we look at the historical context, I’d say everything made sense.

Intel came up with segmentation back when 16 bit pointers were big, but 64KB of RAM not quite capacious enough. They didn’t have 32 bit (never mind 64 bit) addresses in mind, nor threads; they only wanted to address 1 MB of RAM with their puny registers. When thread libraries abused segments to implement thread-local storage, the only other options were to over-align the stack and hide information there, or to steal a register. Neither sounds great, especially with x86’s six-and-a-half general purpose registers. Finally, when AMD decided to rip out segmentation, but keep FS and GS, they needed to make porting x86 code as easy as possible, since that was the whole value proposition for AMD64 over Itanium.

I guess that’s what systems programming is about. We take our tools, get comfortable with their imperfections, and use that knowledge to build new tools by breaking the ones we already have (#Mickens).

Thank you Andrew for a fun conversation that showed the segment hack might be of interest to someone else, and to Gabe for snarkily reminding us Rseq is another Linux/Silicon Valley re-invention.


  1. That’s not as nice as rewinding the PC to just before the syscall, with a fixed up state that will resume the operation, but is simpler to implement, and usually good enough. Classic worst is better (Unix semantics are also safer with concurrency, but that could have been opt-in...).

  2. That’s not a new observation, and SUN heads like to point to prior art like Dice’s and Garthwaite’s Mostly Lock-Free Malloc, Garthwaite’s, Dice’s, and White’s work on Preemption notification for per-CPU buffers, or Harris’s and Fraser’s Revocable locks. Linux sometimes has to reinvent everything with its special flavour.

  3. For instance, SuperMalloc optimistically uses TSX to access per-CPU caches, but TSX is slow enough that SuperMalloc first tries to use a per-thread cache. Dice and Harris explored the use of hardware transactional lock elision solely to abort on context switches; they maintained high system throughput under contention by trying the transaction once before falling back to a regular lock.

  4. I did not expect systems programming to get near multi-agent epistemic logic ;)

  5. Which is fixable with LOCKed instructions, but that defeats the purpose of per-CPU data.

  6. I actually found the logic bug before the Spectre/Meltdown fire drill and was worried the hole would be plugged. This one survived the purge. fingers crossed


For older items, see the Planet Lisp Archives.


Last updated: 2019-01-15 17:31