Could ChatGPT help Citizens Advice Scotland?
We thought we were doing pretty well… and then ChatGPT launched.
For the last 7 months, we’ve been building Cassie, a chatbot for Citizens Advice Scotland.
We thought we were doing pretty well… and then ChatGPT launched.
How does Cassie compare? Can we make use of the advances that are demonstrated by ChatGPT and similar models?
A handful of experiments
Fuelled by ChatGPT excitement, we created five prototypes using similar technologies. Two were powered by Elastic, one was powered by OpenAI’s GPT-3 API and we did one where we simply pasted queries into ChatGPT.
Then we compared their responses to the chatbot we’d built (which can be seen on a number of bureaux websites and on Money Talk Team).
We created 20 dummy questions, inspired by the questions that users had asked Cassie this month. We then fed them through the five chatbot versions. If we got a tremendous answer we rated it ‘10’. If we got a terrible answer, we rated it ‘0’.
For example, here is an answer that we rated 0:
user: my landlord is kicking me out for no reason
chatbot: No, your landlord is not kicking you out for no reason. If you are using the property for business purposes and no longer living there, you may have lost any security of tenure you had and could be evicted by the landlord
The chatbot is giving the wrong advice and the tone is all wrong. This is bad.
By contrast, here’s an answer we rated 10:
user: The restaurant next to me keeps putting their wheelie bins infront of my window. I asked them not to with no response so I have moved it from the window myself a few times and they keep putting it back.
chatbot: If the restaurant is blocking your window with their wheelie bins, it is causing you a nuisance and affecting your right to enjoy your property.
Here are the steps you can take:
Speak to the restaurant: Try speaking to the restaurant management or owner and explain how their actions are affecting you. It is possible that they may not be aware of the problem and could rectify it once you bring it to their attention.
Mediation: If speaking to the restaurant does not resolve the issue, consider seeking mediation. This involves having a neutral third party assist in resolving the dispute. Citizens Advice Scotland may be able to help you find a local mediator.
Take legal action: If the above steps do not resolve the issue, you may need to take legal action. You could make a complaint to your local council’s environmental health department, who will investigate and take appropriate action.
It is important to remember that you should not take the law into your own hands and move the wheelie bins yourself. This could result in a confrontation and make the situation worse.
I hope this advice is helpful and please let me know if you have any further questions.
Results
ChatGPT rates best by far. Cassie, the CAS chatbot, came second, which we think is a pretty good achievement.
We only tested 20 questions and evaluating answers is pretty subjective, but the answers are striking. I’m confident they would be maintained across a bigger sample.
What’s more, there’s room for further improvement with ChatGPT. After analysing 20 answers, we felt that ChatGPT could score better if it was more conversational and asked more questions.
For the initial test, we used this prompt:
Imagine you are an advisor at Citizens Advice Scotland. What would you say if someone asked you XXX
Changing the prompt to the following improved ChatGPT’s answers:
Imagine you are an advisor at Citizens Advice Scotland. What would you say if someone asked you: XXX
End your response with a helpful question designed to elicit more useful information or to make the client feel comfortable.
This made ChatGPT go too much in the direction of friendly questioning and not enough advice. With a bit of tweaking, I’m sure we could find a balance.
Challenges
ChatGPT offers good advice. But our current thinking is that it is impossible to use at Citizens Advice Scotland. Why’s that?
1. Data privacy
The questions fed to ChatGPT get absorbed back into ChatGPT for further training. That’s a deal breaker for CAS, who require clarity about exactly what happens to their client’s data.
2. No API
There is no API yet (but there is a waitlist, so there is something in the pipeline). Until then, this is pie in the sky as we can’t integrate ChatGPT into the CAS experience.
3. Answer validation
If 99 answers out of 100 are good, is that sufficient? If 999 answers out of 1000 are good, is that sufficient? CAS understandably wants confidence that every answer is based on fact. But I do wonder if that might change over time? The quality of ChatGPT’s answers is improving rapidly. At some point it may be reasonable to compare chatbots with human advisors, as they also are good but not infallible.
Also, there will soon be similar chatbots that aren’t based on generative models (the G in GPT stands for generative, meaning they make things up). For instance, on another project for CAS we’ve used technology from PolyAI, whose technology is based on a retrieval model rather than a generative model. Once a chatbot can be constrained to retrieve information from the CAS advice pages, whilst offering the conversational ability of ChatGPT, that would be an interesting proposition.
It is worth noting that in the CAS context, advice has to be insurable. Currently, ChatGPT’s advice would be uninsurable, but at some point, a similar system might offer human-level (or better) risk for assurance and claims purposes, which could change attitudes.
4. Improving chatbot responses is trial and error, at best
With ChatGPT we can tweak the prompts, and once there is a ChatGPT API, we may be able to fine-tune it on CAS advice pages, but there is no way to deterministically control the output. It’s more like training a pet rather than programming a computer.
5. Do CAS clients want advice or empathy? Can those needs be disentangled?
Mental health nonprofit Koko trialed a system that let GPT compose messages, for advisors to tweak and then send. They found:
“Messages composed by AI (and supervised by humans) were rated significantly higher than those written by humans on their own (p < .001). Response times went down 50%, to well under a minute … [but] once people learned the messages were co-created by a machine, it didn’t work. Simulated empathy feels weird, empty.”
Two things to ponder. Firstly, could empathy from a computer ever not feel weird? As chat gets better, will we get more used to asking computers for advice.
Secondly, this experiment was controversial (at least on Twitter) even though it seems pretty reasonable to me. There is a human in the loop at all times. We need to tread carefully as we explore this area.
Looking to the future
We are living in extraordinary times for AI.
Of the challenges listed in the previous section, 1 and 2 are likely to be solved soon, perhaps in the next 6 months. Maybe they won’t be solved by ChatGPT, but by an open-source competitor, such as Eleuther.AI.
Challenges 3 and 4 are trickier, but I’d be amazed if they aren’t solved in 2 or 3 years.
This leaves 5: is there a role for chat technology where quality advice is readily available but human support is in short supply?
One way to slice it might be to ask, is there a proportion of CAS clients who need advice, not empathy? If the chatbot could deal with ‘advice seekers’, that would free advisors up to spend more time with ‘empathy seekers’.
Alternatively, could the chatbot improve the process of providing empathetic experiences, by being a ‘copilot’ for advisors? GitHub Copilot is a big success for AI in the workplace. Advisors might benefit from something similar.
In conclusion, ChatGPT isn’t right for CAS, but the current wave of AI breakthroughs offers opportunities that could help clients and advisors find the right advice and that could facilitate the human connection that is key to the CAS service. We hope to keep experimenting, as there are some great opportunities rapidly becoming possible.
—
Update 2nd March 2023: Wow. OpenAI have already solved challenges 1 and 2 with the announcement of the ChatGPT API and new data policies.
—
Feedback is much appreciated. Please email andy@helpfirst.ai.
With thanks to CivTech, UK government, Scottish government and the Robertson Trust for funding this work.
The Priority Services Register (PSR) is a key tool that energy suppliers use to fulfil their responsibilities to vulnerable customers. But there isn't just one register. Every energy company has their own PSR and the application forms vary unpredictably from supplier to supplier.
As part of our CivTech Challenge, we’ve been researching best practice across the industry. We were left with lots of questions:
- Why is 'restricted hand movement' a vulnerability that almost all suppliers assess?
- Why are archaic phrases like ‘bedridden’ used?
- Why does only one supplier check if their vulnerable customers use ‘electric showering’?
Alas, we weren't able to fully answer these questions. But here’s a visual guide to various PSR forms, so you can get an overview of the landscape.
Overview
We accessed PSR application forms for Ovo Energy, British Gas, SSE, Octopus, EDF, Shell and Utilita. For many other suppliers, access is restricted.
The first observation is that the application forms are extremely varied:
- We’ve grouped questions into categories to make things a bit easier to read, however the forms themselves come in very different structures. Some offer all their options in one long list, some separate into smaller sections. Some only show certain sections once a customer has selected a particular option (e.g. selecting ‘sight loss’ gets you extra questions on the Shell application).
- Several vulnerabilities are only mentioned by one supplier. Only one asks about autism, and another asks about breathing difficulties. The following options only showed up once: ‘female presence preferred’, ‘longer time to answer the door’ and ‘bedridden’.
- For sensory needs: ‘blind’ and ‘partially sighted’ are separate options in all the forms. 'Hearing impairment' and 'deaf' are combined in half the forms and the former is not asked at all in one. This may be contentious, as hearing impairment and being deaf are very different conditions.
- Some suppliers include options for accessible information provision in the same form (i.e. braille, large print letters, etc.) Others link to an additional form, or do not reference it at all.
- When temporary conditions are mentioned, only some suppliers allow the customer to select a date when they believe the condition will no longer apply.
- Most of the forms are multiple choice, limiting to what the supplier chooses to ask about. Occasionally the supplier (e.g. EDF) gives the customer a larger space to talk about their conditions, equipment and needs in more detail.
Next, we dive deeper into the application forms.
Medical Conditions2
EDF’s application form has the highest number of options related to medical conditions (20 in total) with British Gas and Utilita featuring the lowest (13). EDF also features options which cover multiple medical conditions (e.g. 'breathing difficulties', 'disability benefits') more frequently than other suppliers. SSE has the highest number of options for learning and mental health related conditions (including 'dyslexia', 'autism', 'learning difficulties' and 'anxiety/depression').
There is some overlap within options, which could be confusing. For example, SSE lists both ‘developmental condition’ and ‘autism’ separately, even though the latter is a type of the former. Another example is the ‘mental ill health’ and ‘anxiety/depression’ options, again found in the SSE form. It is not clear if customers should tick both or only the more specific option.
All organisations feature options to indicate older age, however they specify a variety of different ages as the lower threshold, including: 60+, 65+, 'pensionable age' or 'pensioner'. British Gas have two separate options relating to older age ('pensionable age (65 and over)' and 'age 75 and over').
There is some degree of consistency across organisations. This appears to be where specific conditions have been mentioned within the Ofgem guidance (for instance, 'restricted hand movement' appears in all but one form, in spite of the fact this is a very specific need).
Language Used
The language used across suppliers is very inconsistent. SSE uses ‘hard of hearing’ and ‘deaf’ to describe hearing loss-related needs, while other suppliers employ terms such as ‘hearing impairment’ or ‘hearing impaired’.
Some options have multiple potential meanings: ‘carer’ could refer to the respondent either needing a carer or being a carer for someone else.
All suppliers ask about speech and language difficulties and broader language barriers. However there is no shared way of asking whether a customer speaks English. Variations include: 'unable to communicate in English', 'language barrier' and 'foreign language speaker'.
‘Unable to communicate in English’ (used by Octopus and Ovo) is somewhat ambiguous. Customers might take it to mean having a different first language or having a speech condition. The requirements are quite different: with the former you could use an interpreter or multilingual support, with the latter you would need different support.
Medical Equipment3
Options Offered
British Gas do not offer any specific options for types of medical equipment: they solely offer the generic category ‘mains powered electric medical equipment’. All other organisations surveyed have more specific options. These are broadly consistent across suppliers with some more limited options (e.g. ‘wheelchair’, ‘MDE electric showering’).
Most organisations (bar British Gas and EDF) also ask about reliance on water.
Language Used
It is unclear what is meant by the ‘life support’ option used by EDF. Often the phrase ‘life support machine’ refers to a ventilator, but EDF also have a separate option for ‘heart and lung ventilators’. It could mean life support as a condition or set of needs but that seems too broad for the PSR.
Temporary Changes
In a rare show of unanimity, all suppliers offer the same options for temporary changes.
householder <18
and under
Other Questions
Passwords
All suppliers offered the option of setting up a password or PIN. This is usually so a technician can state this password as an additional security measure on home visits. Two suppliers required a 6-letter password, one an 8-letter password and one a 10-letter password. A final supplier did not specify length. An unfortunate side effect of this variation is that if an individual were to move supplier, they may need to change their password and remember a new one. (Note: not shown in an infographic.)
Life Scenarios
Varying from the multiple choice standard, Shell veer into first person narratives. In their ‘Nominee Scheme’ section of the form, they feature an additional tick box option: ‘I can be easily confused and worried by communications from my energy supplier’. When asking about meter support they offer: ‘I have a prepayment meter and no-one in my household is able to safely read it or top it up’.
Accessibility Information4
Organisations vary on including accessibility questions on their PSR form. Ovo offers seven different accessibility options for receiving information, while Shell offers a single broad range checkbox.
Conclusions
Suppliers diverge considerably in what information they collect on their customers to register them for Priority Services support.
On our travels we encountered the aspiration to create a more standardised or universal PSR. Initiatives like the Vulnerability Registration Service and Experian’s Support Hub aim in this direction. In the future we are keen to explore the user experience of these services and how they aid vulnerable customers.
In the meantime, we hope this analysis will prove useful if you are looking to improve the experience of vulnerable energy customers. Any questions or comments, contact harriet@helpfirst.ai. We’d love to hear from you!
Footnotes
- SSE was acquired by Ovo Energy in 2020. They hadn't completed their move over when we started this research and were still registering people to their PSR. We’ve included them in this analysis as their approach was interesting with many mental health and developmental condition-type questions.
- Some questions have been condensed in the infographics. Numbers referenced in the discussion refer to the full options as available on the questionnaires, but the infographics demonstrate a condensed version for brevity and ease of visualisation. Full original data is available on request.
- 'Heart/lung machine & ventilator' is the most common formulation of question regarding this equipment. However EDF separates these questions into: 'heart/lung machine' and 'ventilator'.
- Octopus and British Gas do not ask about accessibility. Shell only offers a general ‘accessible information’ needs tick box if the customer has earlier selected that they have a visual impairment. This does not mean they do not record this information elsewhere, where these questions did not appear on their forms we were not able to verify what (or if) they ask about accessible information.