To score…or not to score…? Suicide risk in the ED


On a dark and brooding autumn night, you once again find yourself healing the injured, curing the sick, and reassuring the worried well aboard the good ship TheBigHospital ED. Pale, second-hand moonlight spills over the land, which you only notice because you look up on your way past the annex beds when a small whirlwind sneaks in via the ambulance entrance, redistributing a handful of fallen leaves in your direction as an intrepid ambulance crew seek to deposit yet another patient in the department. You quietly wonder about the current odds they have running on how quickly they can push us to official overcrowding threshold tonight. As you contemplate the attractive ochre hue of the leaves, you overhear the ambos giving handover to the harried-looking triage nurse. The patient is apparently a 34 year old woman who has been brought in at the behest of concerned family, after finding she had taken a bunch of various tablets, and penned a suicide note. She has two young children, and as you wander past, surreptitiously opening the lemonade Icy Pole you just stole from the freezer behind triage, you notice numerous old transverse linear scars on her forearms. You’re pretty sure from the quick list of tablets she’s taken, recited by the ambos to the triage nurse, that she’s in no real medical danger.

  • What are the chances that she might do this again?
  • What are the chances that she’ll actually kill herself, on a subsequent occasion?
  • How might you decide if she’ll be safe to send home, or if she needs admission to hospital?

As is often our wont in emergency medicine, what we’d like to do here is perform some effective risk stratification, to decide on the best course of action, both with regard to what we do now in ED, but predominantly what we need to do when it comes to disposition and follow-up. Ideally, this should take into account the needs and welfare of the patient, as well as taking a wider view encompassing resource utilisation and the subsequent opportunity cost to others (both within ED and at a hospital –wide level), along with staff safety and so on.

In the best of all possible worlds, our method of risk assessment would be evidence-based, applicable to the patient population we are actually dealing with, robust, and predictive of what is actually likely to happen to the patient in front of us. Do we live in such a world, you might ask? Does such a tool exist? As is often the case in our practice, your weapon of choice will boil down to a selection of gestalt, or a handful of varyingly useful scoring systems or clinical decision rules. Your armoury in this case consists more or less of:

  1.     Clinician gestalt / using The Force
  2.     SADPERSONS or Modified SADPERSONS score
  3.     Manchester Self-Harm Rule
  4.     ReACT Risk Assessment Tool
  5.     One of a number of bespoke single-centre locally derived tools that are not widely used but have a certain hipster appeal*
( *the use of these is restricted to those who live in Braddon, ride a fixie and have ironically styled facial hair )


All of the above methodologies rest on the assumption that there are a bunch of risk factors for repeated self-harm and completed suicide. More specifically, that there are readily and reproducibly identifiable aspects of the patient’s history and presentation today that yield a consistent likelihood ratio (LR) that, along with a knowledge of the pre-test probability of those outcomes, helps us calculate a post-test or posterior probability of badness that is hopefully more accurately predictive than simply consulting the nearest magic 8-ball.

Lists of risk factors deemed important vary from country to country, textbook to textbook, and can often depend on who you ask, the prevailing winds, and the current phase of the moon. In order of increasing reliability, these risk factors are derived by:

  • Revelation from a deity
  • Vertical memetic transmission (med school, textbook, local psych folk consensus)
  • Logistic regression performed retrospectively on data  from a very big sample representative of your ED patients
  • Prospective data from validation studies of a CDR (derived from the aforementioned logistic regression modelling) carried out on a very big sample representative of your ED patients

From amidst the nightmare that is the task of trying to identify the causative relationships amongst myriad interrelated and confounding variables that hint at, or veritably scream, correlation, some of the contenders for risk factors for repeated self-harm and subsequent completed suicide are…

From my medical school psychiatry textbook:

  • Being single / living alone
  • Being male
  • Depression
  • Insomnia (even in the absence of depression)
  • Substance abuse (EtOH and others)
  • Schizophrenia
  • Physical illness (especially if debilitating)
  • Family Hx of suicide
  • Previous attempts (50-80% of completed suicides have had a crack at it before)
  • Seriousness of previous attempt(s)
  • Recent bereavement
  • Unemployment or financial difficulties

From a current (Local Health Dept) SOP for Clinical Risk Assessment and Observation:

  • Suicidal now or Hx of previous suicide attempts or acts of self harm
  • Chronic suicidal thoughts, but with current intent
  • Aggression / violence
  • Delusions, particularly paranoid ones
  • Hallucinations, particularly voices telling them to harm
  • Hx of absconding
  • Poor adherence to medication programs
  • Substance abuse
  • Hx of inappropriate sexual behaviour
  • Cognitive impairment
  • Medical condition

Based on clinical assessment and using the above list as a guide, our mental health staff assign something called an At Risk Category (ARC) to the patient. They are numbered 1 to 4, but there are 5 of them because someone snuck a 2.5 in between 2 and 3 at some point. There are strong suspicions that attempts to renumber the lists 1-5 in all relevant documents using varyingly obsolete combinations of Microsoft Office and archaeologically significant versions of Internet Explorer most likely led to a nervous breakdown on the part of the administrative officer so tasked, and their subsequent admission to hospital themselves with an ARC of 2.5. The ARC level assigned to a patient is used to help determine who gets involved in the patient’s care, how urgently, and particularly the frequency and nature of observations required for that patient while in hospital. This is a little like the Australasian Triage Scale (ATS) categories as they relate to recommended waiting times, and is reproduced below:

ARC Level

Level of Risk

Description of Obs

Frequency of Obs


Low         General Once per shift


Low – Medium         Intermittent 1 hour


Medium         MHAU Obs 30 mins


Medium – High         Close 15 mins


High         Special Constantly stared at



This is the perennial favourite that most of us grew up with at medical school, or learned to know and love early in our careers as junior doctors. It comes in two flavours: Original and Hot & Spicy…um… “Modified”. The original score comprises 10 items worth 1 point each, and produces 3 tiers of risk: Low (0-4), Medium (5-6) and High (7-10). The modified version has both some slightly different criteria, and introduces weighting for some items (some being worth 1 point, and some 2 points), and also gives you a Low / Medium / High risk answer at the end of the day. You can check it out in all its publicly edited glory here:

A study by Bolton et al published in 2012 in the Journal of Clinical Psychiatry had a look at the clinical performance of this scoring system (both original and modified) for risk-stratifying people who present to ED with self-harm, to see whether it could usefully predict which of them were likely to do it again. Their sample was every patient presenting with self-harm to two tertiary EDs in Manitoba, Canada, from 2009-2010. They managed to collect 4,019 patients with self-harm, 566 of them deemed to have had bona fide crack at actually killing themselves, then followed them up at 6 months to see who’d had a repeat episode. Sensitivity of non-low-risk categorisation for the original and modified scores was 19.6% and 40% respectively, with PPVs of 5.3 and 7.4%. The Receiver Operating Characteristic (ROC) curve was almost a 45-degree straight line, with an area-under-the-curve (ROC-AUC) or c-statistic of a whopping 0.572. Keeping in mind that a coin toss yields an ROC-AUC of 0.500, this is not exactly a vote of confidence for the SADPERSONS risk assessment tool.


This is a risk assessment tool derived in… wait for it… the UK, with the hope of providing something more clinically useful than the SADPERSONS score. It is a simple 4-item list, and a positive response to any of the 4 items flags you as non-low-risk. The 4 criteria are:

  • History of self-harm
  • Previous psychiatric Rx
  • Current psychiatric Rx
  • Benzodiazepine use in this attempt

Cooper and company embarked on this quest to build a better mouse trap, publishing in 2006 in the Annals of Emergency Medicine. They used data from patients presenting to 3 large EDs as their derivation cohort, and applied that to a validation cohort from 2 other EDs, involving a total of 9,086 patients. For repetition of self-harm (including completed suicide), the sensitivity was 94% and specificity 25%.

Realising that prospective validation data is perhaps a little more solid, the same group published a 6-month follow-up in 2007 in the Emergency Medicine Journal, and this time also included a comparison between performance of the MSHR and clinical gestalt. Not much changed in terms of the rule’s functionality. Sensitivity, specificity, PPV and NPV were all about the same as they were in the original study. Interestingly, clinician gestalt had higher specificity at 38%, compared to ~26% for the MSHR.


Not satisfied with the craptacular performance of the then-available risk assessment tools, Steeg and his mates (including the Cooper of Manchester fame) set out to Find A Better Way™ , and in 2012 published a huge cohort study in Psychological Medicine, resulting in the ReACT Self-Harm Rule (RSHR). They collected data for 18,680 patients presenting with self-harm to 5 large EDs in the UK. Paying ancestral homage to their minor author (Cooper was listed last in the 2012 paper), they used a 4-question system as well:

  • Self harm in the last year
  • Live alone or homeless
  • Cutting involved
  • Current psychiatric Rx

This rule had a sensitivity of 95%, and specificity of 21% for predicting repeat self-harm or completed suicide in the next 12 months.


As well as the abovementioned scoring systems which have had their time on the red carpet of psychiatric academia, there are also a few other B-grade actors on the stage. Studies that some people in mental health might think look a bit familiar… they’ve seen them in something, yeah… but can’t quite remember their name, or what they’re doing now.

Bilen 2013 – Emergency Medicine Journal – This was a Swedish study involving 1,524 consecutive patients in a large tertiary ED. The Swedish chefs concocted their own special blend of logistically regressed risk factors and in a veritable My Kitchen Rules of acute psychiatry, pitted it head to head against the MSHR. Following up to see who had repeat self-harm (including completed suicide) at 6 months, they found that in their cohort of patients, the MSHR had sensitivity and specificity of 89% and 21% respectively, while the Swedish secret sauce produced not terribly dissimilar numbers at 90% and 18%.

Tran 2014 – BMC Psychiatry – These guys took a derivation cohort of 4,911 ED self-harm patients, measured the performance of clinician gestalt, cooked up a score based entirely on previous medical history (i.e. only data they could mine from a patient’s prior records; if they hadn’t presented to their hospital system before, their score could not be calculated) and then compared their performance on a validation cohort of 2,488 patients, looking for recurrence at 3 months follow-up. Clinical gestalt produced an underwhelming ROC-AUC of 0.58, and their medical records data mining algorithm managed a respectable c-statistic of 0.79.


Having toured the panoply of ostensibly potentially useful risk assessment tools at our disposal, I feel it’s time to add the eggs, or the Force, depending on your inclinations… that which binds it all together… and gives us some semblance of a pragmatic answer to the question: “So, what should I actually do with my patients in ED??”

In this instance, the special ingredient required to help pull something useful from the maelstrom is provided by the Australian Bureau of Statistics, in the form of data detailing the actual incidence of suicide in our population, enabling us to make reasonable estimations of pre-test probability, and thence gauge the utility of our risk assessment options.

In the most recent year for which complete statistics are available, 2012, the suicide rate in Australia, expressed in various ways (all per year) was:

Females 5.6   per 100,000 1 in 18,000 0.006 %
Males 16.8 per 100,000 1 in   6,000 0.017 %
Average 11.2 per 100,000 1 in   9,000 0.011 %

How does that compare to the risk of repeat episodes in patients who present to ED with self-harm? A systematic review of 90 studies conducted by Owens et al in 2002 and published in the British Journal of Psychiatry found the rate of repeated self-harm in the next twelve months was about 16%, while the rate of completed suicide in the same period was 2%. The 9-year mortality was around 7 %. A more recent review by Carroll et al published in PLOS One in 2014 found similar figures with a one-year fatality risk of 1.6 % and a 5-year mortality of 3.9 %. Furthermore, they also found that these recurrence rates (both for self-harm in general, and completed suicide) were consistent across a broad time period. A total of 177 studies from 1976 to 2012 were included, with no real difference seen between the morbidity and mortality statistics from the 1980’s versus the 2000’s.

So we have the following pieces of information:

  • The background risk of suicide (per year) is 1 in 9,000
  • The risk of suicide (per year) in ED patients with self-harm is 1 in 50-60
  • This is effectively a LR+ = 180 simply by virtue of turning up in our ED
  • Those who score high-risk on a stratification tool have a LR+ = 1.2 – 1.3 (PPV 2 %)
  • Those who score low-risk on a stratification tool have a LR- = 0.25 (NPV 95 %)

In the context of a patient presenting to ED with self-harm:

  • Without scoring them, the 1-yr risk of suicide is 2 %
  • Scoring non-low, the 1-yr risk of suicide is 2 %
  • Scoring low risk, the 1-yr risk of suicide is 0.5 %


Keep in mind that of patients presenting to ED with self-harm, the vast majority will not score low-risk on these tools. A generous estimate is that around 10% will do so. Let’s imagine 6 patients a day present to ED with self-harm:

  • 2,000 patients present per year to ED with self-harm
  • 40 of them will kill themselves in the next year
  • 200 of them will score as “low-risk”
  • 1 of those 200 will still die in the next year, compared to 1 in 9,000 in Australia

With such incredibly bad specificity, and subsequently an extremely poor PPV (i.e. these scores are absolutely useless as a rule-in test), we’re really only looking for a tool’s utility in identifying those who are safe (or at least safer) than the rest to discharge; we want to use it as a rule-out test, a bit like an XDP. Let’s take a simplistic approach and imagine that we will admit everyone that scores high risk, and discharge those who score low risk, and that admitting a patient is 50% effective in preventing completed suicide for the coming year:

  • We see 2,000 patients
  • 200 of them score low risk and are sent home
  • 1,800 score high risk and are admitted
  • We use 1,800 hospital beds
  • 18 admitted patients go on to die
  • 1 discharged patient dies

Now imagine we just admit everyone:

  • We see 2,000 patients
  • We use 2,000 hospital beds
  • 20 patients die

So, implementing a risk stratification tool lets us save 200 hospital beds and perhaps* 1 life-year per 2,000 patients we see, but 5% of the deaths that do occur will be in the group we sent home.

Now imagine mental health are absolute legends, and 90% of those admitted who were going to kill themselves survive at least a year. Using a scoring system:

  • We see 2,000 patients
  • 200 of them score low risk and are sent home
  • 1,800 score high risk and are admitted
  • We use 1,800 hospital beds
  • 4 admitted patients go on to die
  • 1 discharged patient dies

Versus just admitting everyone:

  • We see 2,000 patients
  • We use 2,000 hospital beds
  • 4 patients die

We’re still saving 200 beds, but now 25% of the deaths that do occur will be in the bunch we sent home. Paradoxically, by implementing a scoring system as a rule-out test in this way, we have also managed to kill an extra patient per year!! Yep, really. In the immortal words of some beloved 1980’s cultural icons… “I love this plan. I’m excited to be a part of it. Let’s do it!!”  …or… maybe not.

“Bollocks!” I hear you cry. “There is”, you say, “an error in his maths!!”. Well, maybe, but if you don’t believe me, extend this little gedankenexperiment to examine the boundary condition of mental health inpatient admission being 100% effective for preventing completed suicide in the next year. If we admit everyone, no-one dies. If we employ a score and discharge the 200 low-risk punters, 1 of them still dies because there is a small but non-negligible risk of a false negative and we miscategorised the poor bugger. This is essentially the guy you discharged after a negative XDP who then died of the PE you missed.

This illustrates why simple statistical characteristics of a test or a rule should never be considered in isolation. In the same way that sensitivity, specificity and LR inform our decision making, but are much, much more useful once we know the prevalence or pre-test probability (giving us the more clinically relevant PPV and NPV), it’s worth remembering that the ultimate utility of a test, or scoring system in this case, also depends on externalities that are often overlooked. In this particular case, while the basic characteristics of sensitivity and NPV look promising as a rule-out test, the fact is that the more effective the treatment for the pathology is, the worse the test performs at the end of the day. This would be of only academic interest, but in this instance, the tipping point of doing more harm than good (ie killing more patients than we would by simply declaring all-comers high-risk) falls well within the spectrum of reasonable real-world approximations for the externalities involved.


  • The population risk of suicide in the coming year is about 1 in 9,000
  • The risk in punters with self-harm seen in ED is 180 times that, around 1 in 50 (2 %)
  • Scoring high risk on a stratification tool has zero predictive value; still 1 in 50 (2 %)
  • Scoring low risk lowers that to 1 in 200 (0.5 %)
  • Very few of our patients score low-risk, which means:
    • We save very few, none, or indeed lose more lives by using a scoring tool depending on how effective inpatient mental health treatment is
    • We reduce hospital bed/admission utilisation by around 10% vs an “admit all” strategy



  • The biggest risk factor for completed suicide in the next year is the fact the patient is in your ED now with self-harm or ideation. LR = 180
  • No scoring system can predict who is at higher risk than this baseline 1 in 50 risk
  • Using a scoring system can reduce inpatient bed use by a small amount but has minimal impact, and possible deleterious effect, on morbidity and mortality in this population


(*Footnote: By using a scoring tool to exclude the lower risk patients from admission, the admitted group necessarily have a higher risk of completed suicide than the undifferentiated all-comers group, with concomitantly more deaths in the admitted cohort. I rounded numbers to keep it neat here, but the stated benefit of saving 1 life per year or 2,000 patients is generous, as there will be more inpatient group deaths to offset that saving in the low-risk group. That is, implementing a scoring tool even when the externalities (effectiveness of inpatient psychiatric intervention) are conducive to making the tool look good, doesn’t).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s