Mar 16 2011

Safety is suddenly a lot more interesting

Due to events in Japan, which I don’t need to remind you about, safety as an engineering exercise is a lot more interesting than it was a couple of weeks ago.

Safety calculations have many different forms and many proponents for each, but by far the most common is a simple “expected return” calculation. That is, you estimate the chance of each hazard and multiply by the cost of it happening, and using basic probability come up with an expected cost per year in dollars. Well no one actually uses dollars except the insurance companies, but it’s worth pointing out that the insurance companies are the best at this. No one else is really willing to take the PR hit of saying a life is worth X dollars, though.

Anyway, there are others that say that this is inadequate and that remote but catastrophic events need a different weighting than what I’ll call “operational events” because their ramifications are deeper. What we are seeing in Japan and in particular at the Fukushima plant is that they are totally right. And this affects reactor design choices deeply.

The Boiling Water Reactor (BWR) that Fukushima uses is designed to operate at fairly low temperatures (the steam generating power is only around 250 degrees Celsius) and therefore at relatively low pressures. This makes the operational risk fairly low because only low pressures are involved. And that means you can make the pressure containers weaker because, well, there are not very high pressures involved. And that’s very inexpensive. And that’s very attractive. And safe!

In a “black swan” event, however, like a 9.0 Richter earthquake and accompanying tsunami, operational pressures are irrelevant. Thinks get shook and smashed and the internal temperatures (and consequently pressures) are no longer related to the intended operational conditions. They are now only related to the possible configurations allowed by the laws of physics. Now, these are happily constrained by other BWR design elements like the kind of fuel and so on, but nonetheless they are rather more extreme than the operational safety mitigations protect against. And so now that weaker pressure vessel is looking pretty crap. But, hey, black swan events are in the one-in-a-million range of probabilities! So the ER math works out. It’s worth the risk.

Well, maybe not. Now, I am not going to suggest that a Pressurized Water Reactor is necessarily a better bet — it increases operational risks and they are your day-to-day worry. But maybe a BWR with containment designed for parameters closer to the physical maximum would be a better bet? Well, hindsight is crystalline, of course.

Here’s the calculation that would be better, though, than the chance and cost of a reactor failing versus the cost of making it and the profit generated by it: the cost of a brand or a company failing while an already desperate population gets an extra dose of desperation. I know that sounds mercenary, but I want to find a way to make a fiscal argument so that it’s heard well because industry soon forgets ethical arguments. But cash flow they do not forget.

So, ignoring the safety concerns for a single plant, let’s look at what General Electric actually puts at risk by adopting too simple a calculation (and I am not saying that’s what they did — they may well have done one more like what I propose and it still turned out to be worth their money to make things the way they did). GE does not run a single-plant risk at all, you see.

Rather, their risk includes the risk that any plant with their name on it fails in a public and terrible manner at any time. Okay so right away we can see that operational safety is super important, because now the time frame is reactor-years and not just years. So now a black swan of 1 in 1,000,000 per year is 1 in 1,000,000 per year per reactor. The corporate black eye potential of a thousand reactors is now 1 in 1,000 per year. Yikes! Now that is a gamble that sucks!

Maybe. If we’re talking about GE’s perspective, then we can only really count the cost to them. What’s the cost of a GE reactor failing? And what’s the cost of protecting against a black swan event? Worst case for GE is it goes out of business, totally, dissolving all assets to pay fines and suits. Wikipedia says that’s about $48 billion. So your worst case is 1 in a billion every year per reactor to lose $48 billion. Absolute worst case (and we note that in black swan space we are waving our hands pretty hard by definition). So how much safety is it worth building in over and above the basics and without cost to the customer (because you can’t sell him YOUR safety, trust me) per reactor?

It depends on how many reactors you make — the chance of getting blindsided by the black swan increases every time you sell a reactor. It might actually make sense to stop making them at some point, just because you can elevate your corporate risk beyond acceptable levels just because you have elevated the impossible into distinctly possible — even likely — space by having so many risks on the table at once. If a way-overspecced pressure vessel costs ten million extra dollars to GE, that’s ten billion dollars on a thousand reactors! And that’s a ten trillion dollar ER on a one in a thousand event. Against a $48 million ER for not doing it. So even taking the whole corporate net assets as risked, and accounting for a thousand reactors at once, it’s hard to see it making dollars sense using an ER.

Of course, over 10o years, that’s 100 billion versus 4.8 billion. Still not worth it! If it was only a million bucks? Maybe worth it. Maybe. In dollars.

What are the follow-on costs though?

In a black swan event it’s safe to say that the cost of the failure will not just be the immediate cost of the disaster. It’s not a hundred lives lost directly attributable to the accident itself. It’s also a few hundred thousand people without power at a time when they could really use some power — because they are affected by the same event! It’s the psychological effect of having to worry about radiation poisoning at the same time as your town has been reduced to disorganized lumber. These cannot be dollar values and so they are almost certainly not the manufacturers concern, but the must be the customer’s concern, surely. And I think this is at the heart of this calculation — when considering the most remote kinds of event, we are typically considering natural catastrophes that affect the system under analysis. And that means that the failure, in addition to having direct safety implications, also compounds the damage that is going on around the failure. It is a force multiplier on how much everything sucks and does not occur in isolation, practically by definition.

An ER does not capture this. At all. It is completely unequipped for it. So I think in future we are going to see a lot more attention to more complex modeling methods that answer questions like:

How much does this make the causal disaster worse?

How do we handle impacts that are flat-out untenable (infinite dollar value)?

How do we determine when an impact is completely untenable?

At what point in the probability of an event do we have to assume a surrounding disaster that we might be making worse? Is p=0.0001 implying it? p=0.00001? Something else (I think this is right)?1

These things are not easy and not inexpensive and generally companies are not motivated to solve them unless they can make money on it. That is, after all, the only real metric that we use to judge companies. So that means customers must demand that these cases be addressed even though there is a reasonable expectation that it will never happen to them, and shoulder part of the cost. But I think we will see that and see creative solutions — there’s plenty of room to explore impact mitigation as well as likelihood mitigation.2 For a couple of years there will even be a lot of motivation while the memory of these events are fresh.

The curse of the black swan, however, is that the intervals between are often longer than this memory. And so no matter what we learn today, the odds are good that we will have to learn it again.

–BMurray

  1. One of the things that you often see in a safety analysis is a hazard based on equipment failure, and that failure is mitigated by requiring multiple components to fail simultaneously in operation, which is a multiplied (independent) probability. A disaster, however, makes them dependent and I think that’s not modeled for the most part. If you posit a natural disaster, you can practically assume multiple simultaneous component failures and that means no matter how low you can make operational p, it is never lower than the disaster p. And that means you have to mitigate impact to get an adequately low value — p bottoms out at p(tsunami).
  2. It’s worth pointing out that choosing a different power source can be seen as an impact mitigation (certainly if you install a million wind turbines, you have mitigated perfectly against core meltdowns). It’s also worth noting, however, that we heard almost no news about how many were killed when the natural gas processing plant exploded during the tsunami. By that I mean that the actual impact may not change in ways we hope it will if we do the calculation in earnest. But it might.

Mar 11 2011

Positive feedback loops often suck

I talked about positive feedback loops in game design once in the distant past. They are potential disasters because they are self-reinforcing and not in a stable way. More correctly, they are self-escalating: whatever they are doing, barring outside forces, it’s going to get more like that. You know when people say “too much of a good thing”?

Anyway, here’s your positive feedback loop of the day. After pondering it I have only one question and while I insist it’s not rhetorical, it should at least demand certain kinds of answers.

Here’s the question then: what budget line items in a government’s budget are more important than education? For extra credit, what items are bad governments predisposed to reduce, if we assume that bad government is profitable for members of that government?

–BMurray


Mar 7 2011

Sharpening your doors

I recently ran into a mass of traffic on the Traveller Mailing List that revolved around making airlock doors sharp so that you can cut things in half with them. This sort of thing is why it’s a good thing that there’s a mismatch between the mailing address I reply with and the one I subscribed with (and can’t recall): I can’t reply to these things.

In the past I’ve mentioned that some of my duties revolve around safety. Others have to do with security, which is related. The sharp-airlock-problem is a happy coincidence of both. It also underscores the value of a chart I once found in a paper called “Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes” by Ord, Hillerbrand, and Sandberg. Here’s the chart:

Here’s the basics of that paper. Often in safety we need to calculate very small numbers, because we want to make things that have a very small chance of being unsafe. That’s P(X) — the chance that something bad will happen. So we write complex arguments (A) that detail exactly why a given system is has such a very tiny chance (P(X)) of going wrong and such a hug chance (P(~X)) of being just fine. What Ord &c. point out, however, is that when P(X) is really small, it bears recalling that the calculation is really P(X) given that the argument is right. That’s P(A) and we use the notation P(X|A) to indicate “the chance of X given A is true”. So what they point out is that, given that on average 1 paper in a 1000 is retracted from prominent medical journals for being wrong, P(~A) is actually a very big deal. And we really can’t say anything interesting about P(X) if the argument is wrong. Saying there is one chance in a trillion of disaster is fine, but if there’s one chance in a thousand that you’re wrong, then it’s not very compelling. That big grey rectangle stands for “not very compelling”.

So anyway, sharp airlock doors, right. So the argument goes something like this: there’s a chance that a bear, a marine, or a giant squid (all examples from the mailing list, as I recall it, so don’t yell at me) will try to get into your space ship. It would be handy, given this chance, to make the airlock doors sharp so you can slam them shut on your pursuers. The math is presumably that P(squid) is quite high and so the expense is warranted.

Let’s think just a little harder, however. First, let’s agree that if everything in your spaceship goes to hell — the power fails, the hydraulics rupture, and generally everything goes south — you want everything that relies on these systems to remain safe. Now, this is a space ship, so most of the time what is outside it is, well, space. Consequently, the failure mode for the airlock doors has to be “closed”. That means that they should be constructed in such a way that if there is some kind of failure, it causes the doors to close. That probably means that the airlock doors are under some kind of constant tension (a big spring maybe) and that power and other services are used to open them rather than close them.

Okay so being as we’re talking about safety in dire circumstances, we can also see that we probably don’t want this door to close real hard or real fast — you’re just not very safe if the airlock doors carefully close to preserve the air but a) are closed so tight you can never get out or b) close on you as you try to enter to safety, cutting you in half. So, okay, safe failure, means a closing door that won’t kill you.

Already we can see that sharpening the doors might be a bad idea, but let’s continue anyway.

So the argument then is that P(squid) is more likely than P(electrical fault). That is, you are more likely to get attacked by a giant squid than to have a problem that cuts power (or other service) to the door. And keep in mind that both cases have the caveat “while you and/or the squid are trying to enter or exit” since if that’s not the case then just shut the door already and ignore the squid. More correctly, P(attempted entrance by some enemy that we are happy to kill) > P(fault in any service relating to the airlock door).

Okay, that might even be the case. You might run in very dangerous places. And you might also argue that, being as airlock doors are doubled, a pressure failure only happens if both doors fail. If we are pre-supposing no single point of failure elsewhere in the vessel (not a bad assumption) then the chance of both doors failing is the chance of one door failing squared, which is much smaller even than before. So P(squid) might reasonably be pretty high indeed.

Let’s also keep in mind that P(door) is tested every time you walk through the door, though. Now we’re not talking about the chance of pressure loss but rather the chance of you getting cut in half by the door, so we don’t get to square that (you don’t need to get cut in half by both doors). So P(door) is now the chance that you will be killed or maimed every time you use the door. P(squid) is starting to look like a safer bet as you eye that guillotine edge every time you pass by on your way to the service bay for coffee or back into the ship from an EVA. Couple that with the fact that you don’t want that guillotine to operate only when there’s a system failure. You want a big red button at the captain’s console that closes those on any marauding squids. So now one of your failure cases is “captain closes the lock on your sorry ass” as well as simple door failure.

And you don’t want to override that, or even safeguard it much, because of the apparently very high P(squid).

— BMurray


Jan 27 2011

Fishy marketing

Selling Diaspora showed me some interesting statistical facts that ultimately led to a kind of marketing/business strategy. I’m not actually all that interested in marketing or business, though it’s been a fun game so far, but I am interested in applied statistics — numbers that are powerful and help make intelligent decisions. So keeping track of sales data was inevitable (I like numbers) and analyzing them was inevitable (I like statistics) and using them for something was a treat (I love application — I work in an engineering field). But I don’t want to give the impression that the entire journey thus far was calculated.

Initially we told a bunch of people about Diaspora and decided to sell it to them because we liked it and thought other people would too, and making a few dollars on it would be nifty experience and maybe (no one ever admits to this) let us join the club. The one with the other guys we admire who publish games. I still don’t feel like I’m part of the club, but I’m beginning to suspect that there isn’t one — it’s more like an aggregation of high-school cliques maybe.

Anyway, when we started selling we noticed sales start low, peak fast, and then taper off. I pretty much immediately saw a Poisson distribution in the making but didn’t see it as something you can do anything with. I wasn’t thinking straight.

For half a year we listened to the fan base and the would-be fan base for the game. We talked, they talked, we all reacted, and I took notes. During this time we were debating internally a PDF release and what that would mean, at least in part because it was an interesting academic exercise (I’m thinking specifically about my musings on the problem of correlation between physical and digital media, which is already a known problem between translations and multiple non-digital media). Ultimately we did release it and I watched those numbers closely.

And they did it too. Another Poisson distribution. And that’s when I realized why New Coke existed and why logos change and why NEW AND IMPROVED is on things.

The Poisson curve (with low lambda) has a long and very shallow tail. If sales follow this curve, and they certainly seem to (with lambda around 3 or 4 usually), then as a business-person and as a marketer, you have a couple of ways to make this work for you. You want to amplify that peak, for starters. That’s obvious, though, and the most naive seller does that just by telling people things are for sale. But there’s a richer vein in the tail — if you could fatten that tail then you would be making more sales over a longer period of time. It’s nice to get a big wad of sales, but a continuous stream is the way to stay healthy (at least in part because beyond some critical number of sales it starts to be self-reinforcing).

But there’s no parameter for the curve that fattens the tail. That’s not just an artifact of the math, but rather it seems to speak to facts about selling. What we saw with the PDF release, though, is that if you put a bunch of Poissons together over time, the sum of them (a multi-modal Poisson curve, where each mode is a single curve) is kind of like a regular Poisson curve with a fat tail.

And so a strategy is born. I wanted to fatten the tail of the total sales curve (in red there). I had two modes already (the blue one, Lulu hardcovers, is representative but the mode is really “POD release” — the sum of all hardcover sales, including the purple vendor line) — POD release and PDF release. I needed a third later in the year.

Around the time of our PDF release, we had a few people talking with us about different methods of distribution. At the time we weren’t to keen but I put them in my back pocket. Then in the summer we won the gold ENnie for Best Rules. The timing was right for something now and I kind of hoped the ENNie would do it alone. It didn’t — the award does not generate sales in any interesting way (or at least it didn’t for us) and I think that’s because it’s mostly watched by the industry. Yes, fans vote on it, but fans already bought the game. It’s the industry that’s watching that and thinking, “Wow, I never heard of them before I better check it out and see if there’s a way to make us both a buck. Well mostly me, but you know.”

So, yeah, right after the ENnie we got a few nibbles regarding better distribution. And I wanted another Poisson curve to add to the graph and fatten that long tail right about then. That’s when I re-opened discussion with the gang at Evil Hat.

I’ll be honest — I didn’t actually think very hard about the other offers. Fred Hicks at Evil Hat had already pitched his idea and we liked it already. Even better, he had just launched The Dresden Files RPG and it was selling like crazy thanks to stellar work, great production values, a popular system, and a solid license (with art!). Now with this success came a lot of distribution deals — I’d been watching Fred blog about his experience with Alliance, Diamond, and others. These are all names I hear when I try to pitch the game at stores. That was where the third peak would be.

And so it went — that third peak, in pink, is the sum of the Evil Hat print contribution, our third mode. And I note that our sum curve, the big red one, is only sort of declining. That slope could obviously turn down very hard indeed, but I am optimistic about its shape.

I want to stress, in closing, that the lesson I take away from this is general: when managing the marketing and sales of a product over the long term, you want to be looking ahead to ways to create new peaks. The specific is not a strategy, it’s just what happened. I would not, for example, artifically delay a PDF release in future, for reasons I’ve already discussed (and which Fred made very clear to me) which relate more to community than to sales. But the gimmick, the trick, the talent seems to be to find that next hill to keep the tail fat.

I don’t know if I have that talent — the chart up there is mostly about a confluence of lucky instances — but at least I think I know how it’s done.

–BMurray


Aug 26 2010

Maps, graphs, and other visualizations

So last night I grabbed a mind mapping app for my iPad because I don’t like mind maps.

A mind map is basically just a hierarchical outline that has been painted graphically, so all your leaves are pretty bubbles and the hierarchy is described by arcs connecting these nodes. It’s pretty. But it’s fundamentally flawed because it’s not a way to map your data. It’s a way to organize data in a very specific way (hierarchical) and this very specific way is not always all that useful. Forcing it into that map can be destructive, even. The only way, for example, to imply a connection between two nodes that are not strict parent/children is with an artificial “link” that exists outside the core model of the data.

Why does this bug me? It bugs me because the hierarchy should be an emergent property of the data and not a starting constraint. We should start mapping the data and find out that it’s hierarchical rather than force it into this structure. That is, the mind map severely limits your ability to explore your data set. Instead it becomes just a way to write it down which is, frankly, not interesting.

So anyway I grabbed this app and started playing with it. It’s pretty nifty. It’s very pretty. After a couple of hours enthralled by it I had a huge beautiful map of what this evening’s Soft Horizon game will contain and how they relate. Hierarchically, to be sure, but relate nonetheless. Wow, it is useful. I just had it upside down.

What the mind map does is not organize your data. It discovers your data. What you are exploring is not the data but your brain. You are being invited to invent, decompose, and otherwise investigate the raw stuff of creativity and consequently create something that has structure.

The hierarchical form invite elaboration, for example. I have a node called “Ragged Mere”. It’s a place. I want to know more about it so I start adding nodes (hey are these Aspects?!) like “Peaceful” and “Full of sorcerors” and “Gunpowder”. Cool. I add a couple of NPC nodes — just names, mind you — for people that are somehow attached to these places. Hmm, each also seems to demand elaboration. They get some attached sub-nodes, which also smell suspiciously like Aspects. Pretty soon I have this huge tree of hierarchical data that went all over places I had no idea I was going to investigate. Amazing!

So, okay, I get it. I mean, it’s still a crappy way to represent pre-existing data for all the reasons I ever thought of. But as a creative tool for trying to figure out how to turn a nebulous concept into a structure you can actually use for something, it does indeed work. Because of the way my mind is wired, I have to wonder how much of its power derives from simply being fun and pretty, of course, and that will shake out over time. If it’s useful, I’ll keep using it. If it’s nifty it will gather dust and eventually wind up on my “dead app page”. That’s one step before the trash on my iPad.

The fact that its structure is trivially represented by (and indeed, for many of these apps this is the actual storage format) an outline structure, it’s easy to see how to move from this to a nice linear document, if that’s a path you intend to tread. That’s looking pretty handy too, now.

Damn, I love being wrong almost as much as being right.

–BMurray


May 25 2010

Stuck in the mechanism

One of the problems that a lot of us have in game design is that it’s all too easy to get your brain stuck in the mechanism. I don’t mean in the mechanism of the game (the way that, mechanically, the game is going to deliver the experience you want) but rather way deeper in the nuts and bolts. Specifically, it’s easy to get really into a cool way to roll and interpret dice.

This is a potential disaster. Cool ways to roll dice are just not all that interesting. Unless, that is, they deliver something particular, nifty, and intentional. Okay, actually intentional is not necessary but it sure puffs you up as a designer.

The crux of any dice system is the probabilities it delivers. Typically what you’re choosing between is a small number of interesting distribution curves — linear ones, like a simple d20 roll; nice peaked curves like multiple dice summed; and stark triangles with a zero peak, like d6 – d6.

While working on the games we are currently planning, we have to confront the Fudge Dice. There’s a lot of FATE inspiration in our work and some even look very much like FATE, but there’s also the fact that we are pulling some of the novel concepts from Diaspora into the games, and Diaspora uses Fudge dice. So when, for example, we re-purpose cluster generation, we have to decide whether the whole game can use Fudge dice or whether the cluster system can be effectively rebuilt with something different to match the new game. Or, least appealingly, we might use Fudge dice in one place and something else somewhere else (and it’s interesting that this bugs my when AD&D used five different kinds of dice back before your creepy d10 and that didn’t and doesn’t bug me).

So while working on Chimaera, which relies heavily on a cluster generation variant but uses d6 for the most part, we danced around the dice a lot. The resolution system is cool and pretty novel and we didn’t really want to mess with it, but the d6 – d6 triangular distribution was not working either — the zero peak was too low and the extremes too high. Then we (I say we because I can’t remember who) stumbled on this curve: |d6 – d6|.

You get this by rolling two d6 and subtracting the smaller from the larger. It’s basically the d6 – d6 curve folded around the zero and it’s a cool curve. It turns out to be ideal for Chimaera‘s community statistics (which don’t really need a negative — they are very well defined and terminal at zero).

You get a lot of zeros — 1 time in 6. But the peak is now at 1 instead of zero, so fully half of your rolls will (in the long run) fall on 1 or 2, which is desperate but functional territory in Chimaera. Score! 3 is as rare as zero, and five is remote at 1 in 18. You won’t usually see any 5s at the table during community generation, but it will happen and much more often than the Diaspora extremes, making the communities vary more but not in undesirable ways.

The fact that this only needs two dice is also cool — no fistfulls, though the rest of the game satisfies the desire to wield many dice. It strikes me that it could be a pretty nifty resolution tool as well, using, say Skill + |d6 – d6|, because if zero is failure then you have a relatively low chance of failing on average (low whiff factor), a good chance of doing adequately or even well, and smaller chances of extreme success.

Is there a downside to having no negative result? I’m not sure it matters mathematically because labeling the results exactly as their sum is actually kind of arbitrary — you could translate the result anywhere up or down the number line by adding or subtracting from it and get a different curve. Subtract 1 if you want your peak at zero, say. There is, however, something visceral about roll four minus results on Fudge dice — just seeing all those little failures is an emotional let down. I think that generally you want that — you want a roll that sucks to look like it sucks.

So I don’t know if there’s a resolution system to be built over this curve but I’d like to see it. None of our games at the moment demand it so I’m not really experimenting with it in any detail. But that shouldn’t stop you. If you make something nifty with this (or know of someone else who already has — I haven’t really researched it or anything), shout!

–BMurray

Addendum: I just realized that because all of the bars are even numbers, this maps perfectly onto a range from 1-18, which is D&D stats ranges. And it implies that there is specialness in the ranges 1-3 (0), 4-8 (1), 9-12 (2), 13-15 (3), 16-17 (4), and 18 (5). And so I’m now wondering if there’s not a very cool way to map this dice system onto D&D stats and get rid of both the d20 and the funky stat bonus calculation by applying stats directly somehow.


Nov 10 2009

Mathematics and beauty

I’ve always loved mathematics. I’m not very good at it, but I’ve always loved it. From nearly the first moment I owned a computer I ravaged A.K. Dewdney articles in Scientific American for ways to explore and visualize mathematics and was regularly surprised and thrilled. Every image out of the Mandlebrot set, now commonplace, was (and remains) a jumping off point for imaginary journeys.

Now, finally, someone has found a rich three-dimensional variant of the set. It generates images like this.

Ice Cream From Uranus

Click through for more images and some wonderful discussion on the discovery. I don’t really have a lot to add to that — I assume practically everyone is familiar with the self-similarity of fractal objects and with the pseudo-regularity that dynamic formulae can produce. It’s amazing and it has deep ramifications for practically everything (ecology in particular is a massively mult-variate dynamic function, which means you basically can’t predict the outcome of anything ever even though it’s technically calculable). It’s also just mind-bogglingly beautiful. That math up there is packed with alien worlds.

–BMurray