I actually know something about safety. I work in a safety-critical industry (automated transport) and deal with it every day. I deal with it as a matter of process and know ways in which safety can be astronomically improved when looking at a system that has not been designed for safety. Now, in our industry, safety means “no one gets injured or dead unless the best possible outcome requires it, and in that case it is limited at the expense of all other factors”. So, for example, sometimes the only thing you can do is stop as fast as possible, and that may injure someone. So you only do that when the alternative is worse. As you might have guessed, there is some probability math in there.
One joy of studying safety (and I’ll stress that it’s not my expertise — we have a department that only analyzes for safety, but we all have to know something about it or we’d never release anything) is that you can apply it to science-fiction stuff and get cool results, like when I did a (flawed, it turns out) simple safety case for Traveller-style anti-gravity systems on starships. This made some unexpected subsystems necessary and several of them would make cool hooks for a game.
This is not about that. This got stuck in my head while walking to work. I have gone on (and on) about it in person to some of you, so please forgive me. Here I go again.
Adversarial activity benefits from unpredictability. When we behave unpredictably, it is more difficult for an adversary to find us, to reach us, and to harm us. Unless one or the other is vastly superior in some essential category, behaving unpredictably (within the margins that protect your strengths, so not just random but random and still taking advantage of being really fast and wanting to get further away) is good.
Cooperative activity benefits from predictability. There are edge-cases, like brainstorming, where you want some creative randomness in order to open new avenues for investigation, but generally when acting cooperatively things are best served when predictability is increased. In creative endeavours this is a weak statement, partially because there are adversarial elements to the process but also because it’s exploratory. In safety-related contexts, though, it is an absolutely hard rule. Safety requires conservatism and part of that is a demand for the best predictability you can get.
Traffic is a cooperative, safety-critical system.
There are funny-but-true ways to say it’s adversarial. These are bullshit. Take it from me, a pedestrian. Adversarial driving is bullshit. It will kill me or some other pedestrian. This is not on.
So, obviously, as traffic (and in traffic I include everyone in the system — pedestrians, workers, emergecny crews, commuters, cyclists, transit, whatever) is necessarily cooperative, it benefits from predictability. So how do we get predictability? Easy, with a process that everyone follows!
Yes, traffic law. Here’s where I wanted to go: no matter how stupid or inconvenient a traffic law seems to you, obeying it increases predictability and therefore safety. Disobeying it — again, no matter what it is! — decreases predictability and therefore safety. There is zero mileage in saying you disobey a traffic law because it’s stupid. It might be. It might cost you minutes a day. I do not fucking care, because when you behave unpredictably by disobeying traffic law, the odds are much higher that you will get a pedestrian killed than pretty much anyone else.
And this means you, too, cyclists. And you also, pedestrian. Buzzing stop signs or walking on a red-hand signal create unpredictability and increase the likelihood that someone will get hurt. And the worst offenders are the highest on the list of likely victims: pedestrians, cyclists, and then motorists.
So while it may feel uncool to obey the law, and it may save you seconds or even minutes, and it may seem like a dumb law, please embrace it. Decide to be proud of following this one set of procedures, no matter how iconoclastic you want to be. In fact, obeying traffic laws is kind of against-the-grain now anyway, a kind of punk straight-edge fuck you to the slackers. Make it yours, be proud of it, and then do it. It’s important.
–BMurray
I once did a safety analysis of artificial gravity systems in Traveller spacecraft.
I was tempted to stop there, actually. That’s kind of an article in itself — it’s turgid with meaning and ramifications and questions without even elaborating. Because what I did there (though in the context of play and therefore not nearly as rigorously or detailed as I would at work) was my job, but with a particular kind of science-fiction technology in a particular game setting instead of with my more usual target technology.
This was a great exercise for me. It never actually saw play, but it added a lot of quiet verisimilitude to a game or two — it gave me acronyms to throw around for NPC dialogue that were grounded in a context. It gave me a host of scenarios to explore as play (and really, studying failure modes of technology is practically the definition of plotting a good science-fiction story) and it was fun to do. I guess it helps that I like my job.
It also implied things about technology that I love. For example, there’s a credible argument than in a thousand years we will still use big relays that go THUNK for some things. Here at work we’ve been trying to get rid of them for years, but they remain an incredibly cheap and incredibly reliable way to handle safety-critical switching. There might be something new on the horizon, but beating that much cheap and that much functional is pretty hard.
Anyway, the exercise delivered on three axes: it was fun in itself, it informed play in a way I found fun at the table, and it was useful in the workplace as way to abstract a problem out of its context and think about it from a new angle. So I try to do it when I can.
Another place I get to do work-hobby is in typesetting. I write a lot at work — probably two- to five-thousand words a day. I also build a lot of diagrams, sometimes having to invent new symbology. And so I am often faced with new problems in typesetting to deliver complex material in a useful fashion and that lets me build game-publisher constructions in the context of learning more about my own work. Recent efforts in finding an electronic format that cross-correlates well with print have been fruitful, for example. I have several electronic layouts now that explore the issue from different angles using my work criteria as requirements but my game context as text. Am I playing at work or working at play? It’s a good life, at any rate.
I used to do this in my ungaming period (we call it the Dark Ages around home) as well — I was doing a lot of coding at work and would experiment with new languages and ideas by building gaming tools or IRC robots or something. A lot of code got built and a lot got learned and again I was working-at-play and playing-at-work.
A lot of people don’t do this because their work is not playful. By playful I don’t intend to imply frivolous (see my safety analysis above — the work is as far from frivolous as is possible; lives literally depend on it being right) but rather diverting. Enjoyable. Entertaining. And here’s where I want to link to our Trouble with Lulu recently — it seems likely to me that this lack of play is part of what alienates people from their work to the degree that they choose to become cogs rather than humans in the machine that hires them. But if solving that problem was play, it would have been done better and faster.
There are some highly professional cogs too — not cogs in the sense that they are automatable but cogs in the sense that they elect to be automata at work. I’ve met a lot of dentists like this and, increasingly, computer programmers. They don’t love their work and they don’t engage it playfully and eagerly. They may do it well (though my experience is that they don’t) but mostly they do it adequately. They selected the career fundamentally because it seemed likely to deliver a job with good pay. They get no joy from being at work and they cannot imagine getting joy from work. And consequently they generally look to maximise what does motivate them at work — pay. These people sometimes do a lot of overtime, paradoxically, traiding the leisure they do love for even more pay.
Worst of all are people who must be cogs because a human cog is cheaper to employ than a real cog. These people are de-humanised. That makes them easy to dismiss, but it is them I want to address.
This is what automation (in a broad sense) is all about: de-cogging humans. Because a person that is not a cog is free to be at play, and it is at play that our best thinking happens. So in our office, for example, we have a simple rule for everyone from receptionist to, well, theĀ top: if you do the same thing over and over again, find a way to automate it. Use your skills or call someone who has them, but turn that repetition into a program that does it the same way every time. Play goes up and error rates go down. The guy who loves hacking little scripts does so, and the guy who hates converting Primavera to Excel the way his boss likes it can now click GO and get it done.
And this is where our future must aim: re-humanising everyone. It’s not something we can plan for completely — it’s not a blueprint for yet another Utopia — but it is a goal worth pursuing at every turn. There is good solid work for humans all over our artificial strata of status, but there is also awful, stupid, automatable work that makes some people have to see themselves as un-human, at least for the work day. We should make a place where everyone gets to be human all the time.
I keep smelling whiffs of Marx and Engels. Hrm, mostly Marcuse, now that I think about it. Recall that criticisms of capitalism are separate from the failed blueprints to fix it. Also recall why human rights are important. It’s that first adjective.
–BMurray
I haven’t talked a lot about the recent downtime that VSCA has had with Lulu — currently Diaspora is not available because of a fault in their system. It’s been down for six full days and there is no sign that it will be corrected soon. There’s no sign it won’t either. Basically there are no signs.
So this is an interesting automation failure — a highly automated system stops functioning and offers zero information. That’s bad. But it’s also familiar to me and in the context of familiarity it’s good. Sort of. Half of it is good and half is bad. All of it is bad for Lulu because of a purpose mismatch.
I work in a safety-critical software development environment. I, personally, don’t write safety-critical software but I do review it and analyze it and research ways to make it more functional and more safe. So I know something about it.
I know, for example, that an essential feature of any safety-critical design (hardware or software or both) is “fail-safety”. That is, the idea that if a component fails, it does so in such a way that the result is safe. This is usually accomplished by the equipment constantly asserting the tricky state so that when it stops asserting, the equipment goes to a known safe state. An example of this is the “track circuit” system in fixed-block rail (an antiquated but functional and very cheap system) — basically a current is run through a rail and a relay is connected. As long as current is detected, the relay is closed (and it’s a gravity-open or spring-open relay — constantly asserting “occupied” unless powered closed). When a train comes by current travels between the two rails through the train, short-circuiting the system, opening the relay, and flagging the block as “occupied”. So if power fails, the fail-open relay asserts “occupied” whether or not a train is there because it’s safer to assume one is there than not. If the magnet on the relay fails, the relay fails (thanks gravity and/or spring) open, flagging “occupied”. If a metal bar falls across the rail, it shorts and the block is flagged “occupied”. Basically we do work to keep the unsafe state and anything that interrupts that work (a failure) flags the region as closed, which is safe.
So that’s pretty cool — it’s a remarkably simple principle that can make very complex systems certainly safe. It has a side-effect, though: it’s very brittle. Because so many failures are treated safely, and because the safe state is almost always a shutdown in operation, transient failures cause the system to halt, which is very inconvenient. Worse, marginal failures that are not unsafe often must be treated the same way (or so coarsely detected that the fall under the same category as any other failure) and so you can have perfectly safe situations causing an outage.
This, I think, is what happened to us at Lulu. I’m not saying that they are safety-critical, but they have used a safety-critical design pattern inappropriately or perhaps without attending to the rest of the design pattern: the bit that says “this will have these effects on service and you need to ensure this other thing or you’re screwed”. They appear to have a mechanism that I will call “hold and latch” on error.
This means that when they detect a certain category of error (in this case a printer’s failure to print — and this is not stupid because they contract printers so they can’t just solve it instantly themselves) they hold the process (de-list the item so it cannot be further ordered when there’s a known error in the production pipeline, thus avoiding pile-ups) and latch it (disallow automatic recovery so that it must be verified solved by a human before it can proceed). In rail, this process is used when the guideway detects an intrusion near a platform, because this usually means a human has fallen into the track area. When this happens, a hold occurs (any trains nearby apply emergency brakes and no train motion is allowed in the region) and it is latched (it can only be cleared when authorized personnel have visually inspected the region and reported it clear).
The problem with Lulu is obvious (to me anyway): they have latched it but have no effective way to determine whether or not the hold condition has been cleared. Their printer is not talking back adequately (and if they are supposed to unlatch it, automatically or otherwise, they are not) or is not being timely in clearing the latch or has communicated but Lulu proper is not clearing the latch and restarting service. This is a problem with Lulu because it can afford transient errors: lives are not at stake here. A substantial queue of work can be managed, and relatively cheaply. Less expensive, certainly, than failing to sell product which is, presumably, how they make their money. So treating the queue as a safety case is not helpful here but it’s a seductive methodology if you are very very tightly focused on a single cost. Especially if your focus is so tight that you have not attended to the caveat: you need to clear your state rapidly and correctly or everything stops working.
–BMurray
Bad Behavior has blocked 94 access attempts in the last 7 days.