Get the Complete AI Native Guide: From Cloud Native to AI Native

Get the Book

Podcast

August 05, 2025

Building AI agents in sensitive financial enterprises

00:00

Building AI agents in sensitive financial enterprises

ai in debt collection

conversational ai

financial technology

compliance and ai

Real-world AI agents in financial debt collection. Learn about compliance challenges, data quality, and ethical standards in sensitive financial operations.

Hosted by

Deejay

Featuring

Qamir Hussain

Guest Role & Company

Head of AI @ Aryza

Guest Socials

Episode Transcript

Deejay (00:00) Qamir Hussain. Thank you very much for joining me. You are head of AI at Arisa. When we met, You were at Webio, which I gather has been acquired. So congratulations on that. Can you start off by explaining what it is that Webio does and what you do and what kind of AI-based solutions that you've been building over the last few years? Qamir (00:17) Cool, thanks for having me. Yeah, it's all changed now since we've been acquired, but let me give you a bit of a background into what we've been doing at Webio. So I joined Webio about five ⁓ ago, so it'll be five years in September. So I have a background in AI going back quite a bit. And when I joined Webio, they were already open running as a business servicing the credit and collections industry. They just wanted to add conversation AI to improve. how credit collections agencies communicated with their customers, really at scale. when you have thousands of conversations going on every week or whatever, for agents that it can be very difficult to kind of keep up with that. So the idea was that to add in AI to help with agents, help agents with what's going on in conversation. So. tagging a message or a conversation as somebody's potentially vulnerable to go through a difficult time and maybe that particular conversation should be handed off to a human agent so they can deal with them and help them out. And the whole purpose of it was to help people with really difficult conversations. The conversations around money are very difficult and if you're especially deaf, just really, really difficult conversations. And the idea with... platform with Webio and the Credit and Cash platform is to sort of help everyone out, but really the person that's in debt who might be going through a very difficult time. mean, the reason that they're in debt is that something happened. They weren't able to pay something and maybe they just need a little bit of help to redo their payment plan to pay off the debt. And that's where the AI sort of comes in to help. So the idea is, we sort of believe the whole human in the loop. sort of thing of AI. You know, that the AI will make decisions, it will classify, it will summarize conversations, it'll do all of that stuff that's really helpful. But it's all controllable by our customers. So they can put up automations, they can put in fallback for when AI doesn't quite get things right. And the idea is that it'll... I have sort of two principles. One is that the AI... does a very high quality job of the work that it's doing, but also that we can maintain it as well so we can improve it over time and add in extra functionality. So then just in terms of sort of where the AI really helps our customers is to do that sort of hard work that agents do of assessing someone's where they are in the debt cycle, help them out, deal with sort of like really tricky conversations. And a couple of things that were sort of important to me personally was that we wanted to be able to control the AI ourselves. So we didn't want to hand it off to a third party. One, we have compliance issues, but also I think from a quality point of view, while it would, it took us, let's say a little bit longer to get the first version out. By doing that, we really understood the needs of our customers and provided a solution that really helped and really solved the problems. And also, as you know yourself, AI has a habit of being great and then a little thing like, I don't know, full stop or a word is added into a data set and you train it and all of a sudden it's not picking something up as well and the quality of it drops. And so being able to recognize that quickly and fix those issues was really important to us. Because at the end of the day, AI should be doing work at a very high level, at least for the stuff that we're involved in, these are sensitive. ⁓ conversations. Deejay (03:28) Got it. So Webio is offering a platform for allowing debt collection agents to talk to debtors, people that have fallen on hard times and owe money. so Webio itself is not kind of doing the debt collection. It's more of a platform that you offer to people that do need to do that collection. And so the kind of What are the instigators of conversation there? it that the person owes money kind of logs onto a website and starts a chat? it the agent reaching the collection? We've got to worry about the different types of agents and credit collection agents versus AI agents. Is it the credit collector reaching out to the customer or is it both kind of modes of... Qamir (04:06) So it really sort of stems from the DCA, so debt collection agency. So our customers will sign up to the platform, they'll set up a campaign, so they'll import numbers, sort of the amount of debt that's owed, the information that they need, and then they'll set up a campaign to have this conversation and eventually collect as much of the debt as possible. So campaign will be set up. messages will go by SMS or WhatsApp or whatever and the conversation starts from there. Deejay (04:37) Got you, got you. And like you say, you know, you've got people in quite vulnerable situations and also from the point of view of the people who are owed the money, they want the conversation to go well. They want people to be able to afford to be able to pay them back. They don't want people to be bankrupt because then they don't get any of their debt back. So it's a win-win situation if it goes well, the conversation goes well for both sides of that. So the kind of technology that you're using, you mentioned there about having the AI in house. Does that mean that you you hosting your own models? Are you thinking more about the AI engineering and the plumbing together things? What kind of systems you have in place and how do those all hang together? Qamir (05:16) just on the how we sort of manage, we manage all the models ourselves. So we started off, we looked at several models to see how well they work. We fine-tuned them for the sort of work that we need. So typically we looked at customer service models, know, models that have been trained up on customer service type scenarios. there's no credit collections models out there. So we built all of that into it. ⁓ And that took quite a while to get it up to a standard that we were happy with, that it could solve real life credit collections conversations. So we managed the fine tuning of the models. We managed all the data around the models as well, the pipelines, all of the deployment. DevOps, ML Ops around it. And it started off really, again, from two reasons. There were two reasons why we started to approach. One was compliance. You we can't have sensitive data going off to, know, sort of like chat GPT or any of these services. Again, just from a compliance point of view, it's very important that, you know, we manage. the data stays just on our servers on AWS in the particular region in question again from compliance point of view. But then really beyond that. You know, the usual stacks that we work with Python, AWS, GitLab for all of our sort of workflow for, you know, once a model is trained, it appears on S3 bucket, then that model will be sent over to an AWS process that will build the server, inject the model in, bring it up, and we have an endpoint that's available. on our backend and some of our customers might have a customized version of a model. Sometimes they just have the general credit and collections model. So yeah, we essentially control everything in the pipeline. Deejay (07:05) Got you. And when you mentioned that you needed to fine tune these to be more kind of capable of having credit collections type conversations, how clear was that, that you needed to fine tune them? Was there a point which you just tried using foundation models out the box and realized they weren't up to snuff for? did you find that there were points where the foundation model was just weren't up to the standard that you needed. And then you had to fine tune. How did that process work for you folks? Qamir (07:31) Yeah, so I've done this before, you know, taking a model and evaluating to see how well it works. And it was always like an open source model. you know, even with even if there had been a credit collections model, we still would have had to do the evaluation to see how well it picks up. I'll just give you an example of sort of the essential work that we do, classifying a message as being vulnerable or somebody wants to make a payment or somebody wants to... just wants to call back to discuss their account. Just that simple task of picking those messages up and label them correctly and picking out any entities. It'll work on 10 messages and then all of a sudden you have a data set, say five, six messages and it picks half of them up. And then you do that evaluation and you might get something like, I think when we start off with like 20, 30 % on. real conversations, real messages that it was classifying correctly out of the box. that's just, you just can't use that in a production case. you have to fine tune it and get it up to a high level. And we have a minimum bar, which is very high before it's allowed on production. Deejay (08:39) And presumably, did you have a load of training data already in existence in the organization? Was WebEO kind of doing its thing before you injected AI into the mix? Qamir (08:50) Exactly. Yeah. So, you know, we were lucky that we had lots of very high quality data. And as you know, you know, that's the key to sort of good AI and good AI training. So we had a lot of data. So we take that data, the processes basically, we take that data, it's all anonymized. So there's never any PII information that ever goes into any training. What we really care about are how somebody might express, you know, vulnerability. I lost my job, my partner lost their job or we're going through a difficult situation. So there's so many different ways to express different types of vulnerability just as an example. the vulnerability side is part of our largest data set in terms of the intents that we classify, know, some of the other intents that we'd have are things like payments or somebody just wants to call back and so on. And so, yeah, so we had, we have a huge amount of data and it's a case of using that data correctly. and getting the model, train the model, getting up to just on the classification side. There are other AI elements that we have in our product set. For the classification, we just need to get it as high as possible. And we built our own tools to manage all of that. So yeah, having that high quality data in real life scenarios rather than sort of... imagined synthetic sort of data was key to getting it up to a very high standard. Deejay (10:08) Yeah. And it's a situation where it's easy to imagine that new AI startups are kind of on the back foot because they don't have that existing set of data. Whereas incumbents who already doing a thing have the data, have the real conversations where they can then use that in training and fine tuning. Qamir (10:26) Yeah, exactly. Like if you're a startup and you're new to it particularly, you have to get that data somehow. So you might start off with synthetic just to get something like a POC or something out the door. But at some point you're to have to inject it with real data because ultimately that's what gets classified is somebody saying, sorry, campaign next month, lost my job. And then I just phrase it in a very casual flippant manner or and the AI may not pick it up. And obviously if you're using the larger models, they're going to be more sophisticated and they might pick it up, but you still have the situation with ChatGPT, say if you're able to use that, you still have to build a system for ChatGPT to recognize very niche conversations. Deejay (11:12) Got you, got you. So evals, you mentioned that you've built some custom tooling there to do the evaluations. Is that because you had really particular requirements or is it because you were quite early as an organization into the AI LLM space and so you had to make something for yourselves? Qamir (11:31) That, but also I looked at what was out there. It was open source tools for annotation for managing your models and comparing data sets and all that kind of stuff. So I evaluated quite a few tools. And there was some amazing tools out there, some really powerful tools, it really just, none of them really fitted what we needed. again, I've sort of worked on this stuff before. and I've built sort of eval tools before. So I had that experience. Like we're a small team. So, you know, we really didn't want to build tools unless we really had to. So we sort of went down that journey. We evaluated a few tools, I looked at some open source stuff. And I just found that they just, it didn't give us the customization that we needed. And so we built our own tools. It's all Python basis. for our web applications, but they're internal tools only. And so the two tools that we have are Jupyter and Hopper. Hopper basically ⁓ manages all of our data sets, custom ones for particular needs for particular clients, the general one. And then Jupyter is all about QA. So we'll have loads of examples of data that the models have been trained on, data that they have been trained on. all sorts of variations. And we built in a whole lot of functionality that helps us out a huge amount. So we can do things like, say we just want to test a hundred conversations and compare two different models. So let's just say, you know, we've got a customized model and we've got the general model that we've, the credit collections model, both of which we've added say two or three intents on over the last same month or so. And we want to do an apples to apples comparison on some data that neither of them have seen, which one performs better. So we can do that just a couple of clicks. It'll kick off the process. It might take a little while to run, but at the end of it, we get some graphs and visualizations and some percentages to say, know, model A is 5 % better on these particular intents or these propensities. So that really helped us. Deejay (13:18) I was just going to ask, does the eval process, does that tend to work more in terms of a kind continuous delivery every time that something changes, you run a set of evals or are you using it more in an exploratory sense, more a kind of maybe data science EMA type way of, you know, we're exploring a different space with these different models. Qamir (13:36) Yeah. Yeah, a little bit of both actually. So when the model gets deployed, say, we'll have a staging endpoint for, say, Model X. We do some training, we launch it, and then we want some automated tests just to go through it, just to make sure that it hasn't broken anything. If it has, it tells us what's broken. This particular intent, it's notpicking it up as well. We can go in and see why that happened. ⁓ But also in the exploratory sense as well, when we are say adding in a new intent, which happens every now and again, we have something like 28 different intents that we can classify. And so we're always looking at adding more if the data says so. I'll give you an example of situation that we had for one customer. The models we went classifying overall as well as what we'd like. So they weren't doing an overall. the classification just wasn't as good as we'd like. And so we looked into the data and we noticed that... there was lots of people complaining, sorry, not lots of people, but we saw messages of people complaining about a particular situation. go into the details of it, but they were complaining about why they had to pay a particular fine when they'd actually paid already and they were getting these notices. So there was that complaint, was a number of different types of complaints and they were being classified as... Deejay (14:37) You Qamir (14:56) something else I think of as a vulnerability. So we looked at the data and one of the engineers said actually these are all complaints they're not vulnerability, it's not somebody expressing vulnerability but I guess it was an overlap in some of the phrasing that classified as vulnerability. So when we did the manual check just to make sure that things are being classified correctly we noticed we getting a very low recognition on these messages and so then we created a new intent called complaint added those data points, the messages to it, retrained it and all of sudden the recognition went right up. So the vulnerabilities were being more correctly classified and now we have this new complaint intent that we were able to roll out to our customer. And what our customer told us was they don't get a huge amount of complaints but they take a huge amount of time to resolve. So somebody complains about something, let's just say they get like 10 complaints a week, which isn't a huge amount, but they take a huge amount of person time to resolve. So being able to classify that helped our customers to reduce the time to resolve customer complaints, improve their CSAT scores, and just get the job done a lot quicker. Deejay (16:07) It sounds like continuously monitoring things and having an awareness of how well the product is performing is really key here because of the variability in the input data, which is human conversations compared to traditional software development, where it's like, you know, we've delivered a load of features the product manager asked for. We've got acceptance tests that prove they work. It's done. It's bulletproof. It's going to work now until forever. Organizations that maybe are not used to continuously checking things, asserting that things are working the right way. It sounds like they would struggle in this kind of situation that you need to have that more kind of agile mindset of, yeah, actually checking that things are working the way that you want them to. Qamir (16:49) Yeah, 100%. I think you absolutely nailed it. Hit the nail on the head on that. I think there's a sort of, I think we focus a little too much on models. mean, focus a little too much on what can the AI do, you know, and, you know, we might look at a model and go, look, it generated, it generates these really cool emails. Like, I'll just send, you know, send an email to Bob and about, you know, the thing that we want to buy and it generates it. The, I think people who've been working in AI long enough realize that Yeah, it's great to have those models. Normally what we do is we take the open source models and we work with them or you can tie into, know, chat, GPT or cloud or any of these to get the AI done. But the key to it is really the maintenance of the AI as in what's the AI doing. So today it might classify really well and then tomorrow you see different messages that come in. and it's just not able to pick it up or it doesn't know what to do or hallucinates more. And so this sort of continuous, I guess, continuous testing, continuous evaluation that the models is working well is really the key to producing high quality AI. And again, that's, you know, that's one of our sort of probably our most important principle is just that the AI, whatever job the AI is doing, does it as a high job. So absolutely, you have to be very agile. Deejay (18:05) Some of the work that you do is around models and training and evaluation of those. And then in order to deliver a functional product, presumably there's a big kind of agentic chat bot side of things. Do you get involved in that or is that somebody else's responsibility? Qamir (18:20) So would you mind just repeating that? Deejay (18:22) So you do a lot of work with the models and fine tuning them and evaluating them in order to deliver a useful product and customers. It's not just enough to have a model. You need to have something that is, you know, a customer can chat to, or is reacting to SMSs and things like that. Is there a kind of agentic bit of software engineering to go alongside the model training? Qamir (18:28) Mm. So, yeah, so we have, strictly speaking, we don't have agents, software agents or AI agents as such, we have workflows. And these are workflows that our customers can set up to do specific tasks. So, one of the workflows might be identification and verification, ID and V. So, we have a bot screen where using sort of a GUI interface, a customer can set up an ID and V scenario. So they can say, when a customer messages in and asks for, yeah, I'd like to discuss my account, for example, they can set up a workflow to say, if this happens or when this happens, we want to trigger this particular workflow. So that'll be identification and verification. So they might say something like, when a request comes in to discuss their accounts or anything, any kind of request, they might just go, well, before we can discuss your account, can you just confirm your name and your phone number as an example? And we can pick that up and then it will go on to the next step. And we have many of these sort of scenarios, a lot of these scenarios that can be configured and set up. So for example, if somebody says, Sorry, I forgot to pay last week. Can I pay 50 quid next week when I get paid? The AI can pick that up as a payment. They've mentioned the amount of money, they've mentioned a date. The customer will set up just a confirmation step in the bot process to say, if it's a payment, confirm that they've mentioned an amount of money and a date if they haven't asked them for it. And then just send them a confirmation message to say, just to confirm. you'll be able to pay £50 next Thursday or whatever the date happens to be and the customer just has to go, yep. And then on that date, a link for payment can be sent through. Deejay (20:24) Got you. You mentioned earlier compliance and, you know, needing to run models yourselves in order to make sure that you, you know, from a regulatory point of view, all above board and, know, can prove the kind of audit trail of everything. What other kind of compliance challenges have you had? Because I can imagine there's lots of folks out there who have grand ideas about integrating LLMs into their products. But I mean, just thinking about Qamir (20:39) Mm-hmm. Deejay (20:51) banking, the number of banking customers I had over the years who insist like, we can't do that here. We're banking, we're regulated. And then you talked healthcare and that we can't do that here because we're healthcare, we're regulated. Can't do that here because it's defense. We can't do that here because it's insurance. like, you're not all that special. Lots of people manage to get this stuff working. So with the compliance issues, what kind of other challenges did you face? there any that stuck out in particular? Qamir (21:15) I'm trying to think, I think we're a little bit lucky in that. Nothing is for public facing, first of all. So, like you can say, go on to Webio and start working with the AI. So there's no public facing API. There's a layer that our customers, they sign on, it's all mediated through their account. It's all sort of hosted, it's all secure. All the AI sits behind the platform. So a customer never directly communicates with the AI. It's all done indirectly. And also we have so many guardrails. So there's only certain things that the AI can do. So like, for example, you can't go and talk to, you can't just have an open conversation with the AI and say, you know, how do I do this thing, which could be really illegal or really dangerous or whatever it happens to be. So we're sort of looking in that regard. mean, the few things that we're very paranoid about and very careful to manage is personal information. So when we're training our models, you know, we actually don't need to put. PII information, we just make sure we have in the tools, have like a three step process, two parts manual, one part sort of automated that'll remove if there's any, when we're looking at a data set to use for training in the next batch or whatever, it look for things like names, phone numbers, any kind of PII information and remove it, mask it or put in W values. then we, if we, then we'll put it into the... the training set. And then also the models themselves for that particular function, all it does is classify. So you can't extract information from the models ever. So I don't know if that answers the compliance question or it's just, think having everything behind these guardrails and there's, we limit what the AI can do. I think when people talk about, and we get these questions from customers, and so we have like a fairly detailed document on how to answer these questions and exactly how our AI works. think when people think of AI and compliance and security and these kinds of things, they think of chat GPT, open conversations. And we don't have open conversations or our customers, or our customers' customers don't have. conversations with the AI. The AI helps to classify, helps to summarize, helps to pick out elements in a conversation that can be used to automate something or help a person out if they go through difficulty. So from that point of view, the compliance is sort of gated really well. Deejay (23:42) So it sounds like it's more of a kind of an assistant and dare I say it, a co-pilot. I don't know whether I have to put a little TM next to that for Microsoft sake, but it's an assistant to the credit collection agent. Qamir (23:49) Yeah. Yeah, I think that's a much better way to sort of phrase it. It's an assistant to our customers, to the agents. The whole point of it is to help the agents get their jobs done at a higher quality or at scale. They can do more conversations during the day because things get flagged and added to queues. Agents can deal with them or tasks and there's many of them that can be automated. get automated and that saves agents time. They can focus on the more tasks that require human involved, like talking to a customer, help them through maybe a tricky situation. Deejay (24:25) You've not many people have been building these kinds of systems in production. You know, there's lots of people that want to lots of people that are thinking about doing it in the near future. And, you know, you've been working on this for a little while. Have you got any particular war stories that spring to mind or, you know, horror stories, things that exploded or things that were really difficult that you encountered the hard way that you can also talk about? Qamir (24:44) . ⁓ Yeah, let me see. That's a great question, actually. ⁓ War stories. think it's something that comes to mind is when you're demoing something that's early stage and it happened in WebView actually, when you're doing the very first version and you'll put in a message to say, it's all like the first version I did was for WebView, it was all synthetic data before we started to have a process with which we could anonymize them and clean the data properly before we did any training. with the models. So just putting synthetic data. it would have, I'd have like say four five intents, a very small number. And I just came up with some phrases to say, I lost my job to pick up vulnerability or yeah, can pay next week to pick up a payment and a date and that they can be used to automate something. And just demoing and of course somebody in sales goes, I'm trying to, they'll just give you a phrase that's payment and you put it in and it just doesn't pick it up. You know, that just happens. It doesn't happen so much now because the models are very mature. The fine tuning ⁓ is very good with the process. But I think early on when you're building the first POCs, when you're demoing it, they can be kind of embarrassing. And that's why we really insist on having that at really high quality before customers have seen it. In terms of absolute howlers, Deejay (25:56) you Qamir (26:05) I'm trying to think. I'm sure something will come to me, but I can't think of it at the moment. Deejay (26:10) No worries. That's maybe maybe it's all pushed to the your mind is suppressing it because it's such a painful, painful experience. On the subject of that you've been doing this for a while. I mean, you've been in into AI since before it was called. There must be so many folks who like, you know, did a masters or something in AI or neural networks decades ago and like, yes, now is my time. How did you get into AI and how have you seen things change over the over the years? Qamir (26:15) That's it. Yeah. Exactly. Yeah. yeah, so I got into AI when I was in, in college, ⁓ in DCU here in Dublin. I was just at one of the lectures. ⁓ I mean, I'd heard the phrase, know, Artificial Intelligence, AI, whatever, but never bothered to figure out what exactly AI is. At that point, you know, I was coding, so, you know, understood how to write a program, logic and these kinds of things. was like, but how can... How is that intelligent the way we're intelligent? And it's whatever way you want to define it. You can have if conditions checking for this, but is that it? Is that what makes intelligence? Check this value, do this, if this happens, it feels like intelligence. And if you have enough of those, it's a rule-based sort of AI, I guess. If you have enough of those, you can build a really clever system, like expert systems and so on. which a friend of mine just mentioned, Neural Networks. And I went and just picked up a couple of books and there was some examples that were worked through on True Tables and XR, Exclusive R, and all of these kind of things. And I had to use a back prop to train a neural network to recognize these. And I just went through it by hand and saw the weights shifting. I worked it out by hand, you give it like random weights and then if it... goes above certain thresholds, it fires or it doesn't fire. I'm just working through that. was like, okay, now I get it. Now I understand. It was a very tiny little example, but then I understood how that mimics how the brain works. And that's how you get artificial intelligence, I guess. And so I just found that fascinating. I just thought this was brilliant. And of course, you think about extrapolating if you had enough neurons, if you had enough data, if you had enough... computer resources, you can get some clever things done. But back then, so this was like the mid 90s, like back then, you know, you're working on Pentiums, you know, there wasn't very much you could do. And so I went and did a PhD then just to sort of get into it deeper. And really what I was interested in was I started off just looking at how you could optimize or get more out of neural networks. So I decided to use, I think the first or second generation of 3D graphics cards had come out. And I just said, look, I'll just get OpenGL to do all of the matrix multiplications, all of the mathematics, and then, you know, use that. Things would be done quicker. So, but they're all toy examples, and this was always a problem. So then I started like, left the area. for a while. sort of concluded the PhD and moved on and just was a software engineer, worked on a couple of startups, all this kind of stuff. And then deep learning became a thing. And all of those promises that I came across when I was doing the PhD, was like, we can now do those. It was really good image recognition. was doing, you know, you could train it on language. And so then I got back into, cause I had done some NLP applications back when I was doing the PhD. But it was all rule-based. You could kind of get it to do clever things, but you knew it was not maintainable at scale. It's going to be very brittle, all of that kind of stuff. So, Dome and deep learning became a thing and then came across. Actually, you can train a neural network on the language and get it to do really clever things. vector embeddings became a thing. I was like, okay, so I got back into the area. And so since then, so I'm to say whatever, 10 years or so, I've been doing a whole bunch of NLP AI type applications. And that's the sort of the experience that I brought into Webio. Deejay (30:01) How do you see things, how do you see the current state of AI, the maturity of LLMs and their usage? It must be interesting kind of having gone from where this was all a pipe dream to it's going to be real now and then seeing kind of tooling and practices maturing around that. Qamir (30:20) Yeah, so just on that, I I think for me, just, every time I read an article that says, you know, there's the bubble is going to burst, AI is a bit crap or whatever people say, it's like, but are those people actually building any systems? Are they using it for something? You know, and I was kind of, I'll be honest with you, I was cynical about, you know, chat GPT and LLMs and then I used it and was like, this is really impressive stuff. Like even if AI even if the models didn't improve, from today, if they don't improve at all, you can still use it for your day-to-day work, whether it's cogeneration, whether it's image generation. And it may not be perfect, but it'll get you some of the way, if not all of the way in terms of getting a task done. And then with sort of agentic behavior, adding workflow, adding processes to it, you can really open your game and... in whatever sort of task you have at hand. And so when I look back on, you know, the voice apps that I developed like over, say, whatever, 20 odd years ago, you know, I had like a microphone hooked up to a laptop and you'd speak into it and it would turn off your light or turn the TV off or whatever. like, was really impressive at the time. It's like, that's like your hello world sort of. situation today. And just the amount of sophistication just on language itself is incredible. So yes, for me, it's just it's it's night and day, you know. And I think we need to appreciate how good AI is, is apply it to a task. Yeah. Deejay (31:52) The, you mentioned that if, you know, models don't get any better, they're still useful right now. And I think there's, there's a truth to that, that if all progress, ⁓ in AI stopped, it would still be probably five, 10 years, probably longer than that, before we figured out all the ways that the current models that we have could be useful. There's so many business use cases waiting to be discovered where, you know, this technology has not kind of permeated its way through organizations. So with further advances, you know, maybe we're not AGI around the corner, but there's certainly lots of potential still to be had. Qamir (32:26) Yeah. Yeah. I mean, I think the whole, ⁓ just when you mentioned AGI, that's fine as a goal if, you you know, the, you know, people like OpenAI and cloud and whatever, they want to pursue that. Absolutely. I think, Again, to the point of if there's no improvements and understanding, trying to get understanding of all of the ways that these large language models, say just those in particular for now, can help us. We still haven't discovered everything that it can do or everything that it can help with. Clearly there are limitations. That's a given. But for a lot of work tasks or even creative tasks, a lot of the stuff is already sort of definable in a sense. Like I was just, you know, before the call today, was just looking at Claude's agents and sub-agents. And there's just some really, really clever stuff going on. So if you're using a code editor, for example, and you have one of the coding agents like Claude or any of these, you know, and you get it to generate some codes, look, give me a front end, give me a login page. And, you know, I want it to look a little bit like this, use this color scheme. it'll produce something that you can use right away. And if you don't like the results, you can have a little process in the background that can go and fix things or it can go and find some documentation, give it some context. for example, you say it builds ⁓ a login page and you don't like the style, you can just say, well, give it to me in what's Google's design language, what's that called? It's just gone out of my head. material design so you say look so it's built this login page but you don't like look really just give to me material design it'll just give it to you material design so even if none of that improves it will still save me a few days a week or whatever to generate that little bit of code and also logic is not just here's something that looks good and is formatted correctly and you don't have to worry about how to that in CSS if you're not front end developer. It'll build logic for you. It'll reason about say a design pattern that you could use in a particular scenario. So yeah, it's incredible what AI can do. Deejay (34:37) A lot of organizations will have had to go through a process of somebody being the AI champion and maybe convincing people that it was the right tool to use. Was that the case in Webio? Was it already a decision that was made before they brought you into the organization or did you have to advocate for it? How did that work for you folks? Qamir (34:55) Yeah, so I was lucky that it was already a thing in Webio to use AI to improve the functionality that was already available within the platform. So there wasn't anything that I needed to do in terms of advocating for using AI to improve things so that was already embedded in the company. So not a problem I had to deal with. Deejay (35:16) Nice, nice. That makes for an easier life. How about customers? Were the customers who are the kind of intermediaries that are doing the debt collection, how did they feel about the AI features? Is it very obvious to them that there are AI features in the products or is it more kind of behind the scenes? Qamir (35:33) So we make it very clear to our customers that these are the AI functions. is what AI helps with. You can use some bot screen to create workflows for your business. have a number of workflows out of the box that they can use, put in their own custom logic, whatever, or build their own workflows through that and that can use AI. The main thing that we had with our customers was the concerns that they had about AI, know, sort of what they were seeing in the press, what they're seeing online in terms of security, in terms of AI just doing, you know, crazy things or whatever. And so our job on that side of it was really educating our customers on how we've put guardrails, what guardrails we've put on. how we limit the AI and limit in a good sense in the sense that we only allow AI, our AI to do specific jobs. It's not an AGI type AI where you can ask it any kind of questions and do all that kind of thing. And also it's behind a wall. It's behind the platform. So the end customers, they never talk to the AI directly for whatever. It's always mediated by a human. configured by a human with fallbacks where things maybe the AI didn't pick something up. There might be a confirmation step to say, did you mean a payment? And if the AI got that wrong, the customer will just say, no, it didn't mean a payment. I meant somebody to call me back. So, you know, we've thought long and hard about how we we use the AI in just very, very specific scenarios. And to me, that's sort of niche AI, I guess. And that to me is far more interesting actually in a sense than it, AGI. AGI is a really big problem, I guess, to solve. Well, it is a big problem to solve. And having things very open. means that you I think you end up producing lower quality AI than higher quality. So if you can it's like the old sort of thinking in software engineering. the more constrained the problem that you're trying to solve, the harder the quality gets. You can do it like if your app does everything, it's going to do everything in a half hour. Whereas if it just does one or two things, you can really get specific from a UX point of view. You can get really detailed in terms of the work that it does. doesn't do too many things. It does a few things, but it does it really well. Deejay (37:30) you Given that you've been building these things for a while, production systems that use LLMs and given your educational experience and the fact that you've been into this for a long time, have you got any words of wisdom, bits of advice for people that are maybe new to this, maybe building something that leverages a large language model? Qamir (37:57) Yeah, I think the first thing is just really just get your hands dirty on a project that you're personally interested in. something, an area that you have some expertise in that you think AI could help out with, use it on that and look at the limitations of what the AI does. And that'll give you an idea of, you know, if you're to say to build an app using it, it'll give you an idea of how well it can perform whatever tasks you need the AI to do. Say like image generation for marketing. You have an idea for generating LinkedIn banners or sales banners or something like this. And you want to give somebody like a prompt to say, give me a banner for a 10 % discount on our makeup or whatever. And the AI generates it. And as you know, some of the models that do great jobs with sort of generating images, the text isn't great, for example, and know Tatjit did sort of improve that. And so just to test where, test the limitations of a model. So whatever you're trying to do, see if you can find out where it fails. And then that gives you sort of at least some sort of boundaries on what the AI can do. And then you can make that decision that within those boundaries, is there enough there that you can do the work that you're looking for? Deejay (39:18) Yeah, do something useful within those boundaries. And presumably once people start to get further along with that journey, that's where that whole kind of continuous evaluation thing comes in of making sure that it's still doing what you wanted it to do. And it's not dropping off and the kind of queries that you end are not changing. Qamir (39:37) I mean, I think just to add to that then, I think is really just focus on trying to keep the quality as high as possible. And one thing that you have to bear in mind is maintenance of your AI. Even if you're using, say, chat GPT or Cloud as a wrapper, maintaining the application that sits around that could be a very, very complex task, especially when you start to scale it up. For small POCs, you should be fine. But as you scale up, the maintenance, the managing of the data, how you handle compliance, ⁓ the question you had earlier on. So yeah, so the quality, maintenance, but absolutely get your hands dirty, play around with it and look at limitations. Deejay (40:15) Is the maintenance of an app that leverages AI different to maintenance of traditional software that kind of new skills or pitfalls that people will need to be aware of? Qamir (40:24) That's a great question. Yeah, for me, you know, when it comes to software as well, it's the maintenance that's the hard thing. We can all say build something, get it to version one or something, but it's maintaining that going forward. I think where the maintenance of AI differs is it's non-deterministic. Like the output is non-deterministic. You're not guaranteed. If you send in If you, for example, in touch-ebt, you can give it the same prompt and it will generate different output. So, you know, write me a poem in the style of whatever. It'll do that. You do it, ask it again, you're get different output. And I think that's the sort of fundamental difference. So you've built an application, say like what we do, classification of messages, right? One of the fundamental things that we do at Webio. You can send a message in to an AI to say, classify this message. You could ask it again. And if your model has been fine tuned correctly, if it hasn't been built to a high enough standard that you might get a totally different classification. And we see that all the time when we're doing our background work, when we're adding a new intent or improving an intent. So from that point of view, it's... it's very different. Once you get in say just in a program once you get a method or a module to say parse json it parses json you know it either works Deejay (41:44) Yeah, it either works or it doesn't work. It's not going to, assuming that JSON parser is half-decent, it'll either work, you know, whenever it's valid JSON or it will not work. but it won't work some of the time and not others. Qamir (41:51) Yeah. Exactly. And depending on the task that the AI is doing, you could get like totally different results. And so you have to sort of, I guess in a sense, contain that, you know. Deejay (42:04) Actually, that reminds me of a conversation I was having on Slack the other day. Somebody was asking what kind of skills should I hire for when building AI based apps? And one of the things that came out in that conversation was that maybe folks that come from an ML background have a slightly different tolerance for ambiguity in the software development. know, there's nothing more frustrating when you're coding of having an intermittently failing bug. or intermittently failing tests. But when you're working with non-deterministic models, then you've got to have some tolerance of like it's worked, you know, 90 % of the time and I think it's all great and art, no, now it's stuffed. It's doing the wrong thing again. Having kind of been involved in the AI space previously and then gone to software engineering and come back, do you think that's a fair characterization? Is it more frustrating to work with language models? Qamir (42:55) Absolutely. I think you hit an elder head is with when you said ambiguity, you know. you have to build up that tolerance. If you're coming from an ML background, you know, sort of embedding, you learn that over time. Again, I'll give a concrete example of something that we've worked on. It's summarizing, say, conversations. So you get a piece of text and you summarize it. You can take that same piece of get to summarize it. Deejay (43:02) Yeah. Qamir (43:23) You do it twice. You're going to get slightly different or very different summaries depending again on how well the model does that particular task. Now, both of the summaries should capture the essential meaning. But it's kind of, there's a certain amount of subjectivity in terms of, does it represent the original piece of text that we gave it? So we built that functionality. Again, we're not using any external services. We're using an open source LLM that we fine tuned to summarize credit and collections conversations. So it'll give you a three point, three bullet point. summary of say, a conversation, maybe have say 20 messages in it. It might say something like, oh, somebody forgot to pay, know, Jim forgot to pay last month. He's out of work at the moment, but he'll be able to pay in two months time. Let's just say that's the essential summary from the 20 or so messages. Sorry, I lost my train of thought there. Deejay (44:16) No, no, no, that's fine. We were talking about the kind of tolerance for ambiguity and how it might do the wrong thing at various times. I kind of like, I wonder whether, you know, the cloud native era was preparing us for this of having done some engineering myself many moons ago, but on systems that would intermittently fail because of network connectivity issues in your favorite cloud provider. And also tests that were excruciatingly slow. Yeah, that was that was quite frustrating compared to being in the fast feedback world of like, right, I'm writing some code, I'm doing local unit tests, I find out whether it works 100 % or you know, whether it fails 100%, but there was never any ambiguity. Maybe the progress of technology has been slowly acclimatizing us to the ambiguity with which we'll need to cope in the future. Qamir (45:05) Yeah, absolutely. think just to that actually, think for me what's really important is You know, again, if you're hiring, you know, again, to your question of hiring for that particular role, somebody working in AI or ML or whatever, what sort of skills, it's really more of a QA type thinking that you need. And my belief always has been whether it's software engineer, ML engineer, AI engineer, if they have some background in QA, they end up, they have this kind of detailed oriented thinking, you know. And I think that's... you know, probably the most important skill set to have if you're to be working as an engineer on any AI project, you know, because you're going to look at, you're going look at the happy path, but also you're look at the sad path. How can things break or how do things break in AI? So say in that summary example that I gave, how can things break? How can we test things are breaking in an automated fashion and things like that? So And that was part of some of the tooling say that we built. I think sort of a QA mindset is probably the most important thing. Deejay (46:08) I find that really interesting. For a long time, I've been an advocate of more automated testing and having QA folks kind of be, or what's the word, above the product line in that they should be feeding in requirements and defining quality as a product feature. And then once that's defined in a story or whatever, then engineers go and write an automated test. you know, maybe QAs also get involved in making sure the tests are high quality and testing the right things. But that more exploratory mindset. that more inquisitive mindset. I can see being very useful when you've got things that are not no longer binary pass fail. It's if it passes, it's always going to pass. But you know, we need to find ways to break this system. We need to find the cases where it doesn't work. need to care how often it works. That could be a the future looks bright for people with a QA mindset. Qamir (46:56) 100%. I yeah, QA, I would hire a QA person over anybody else. If somebody had really good QA skills, but no AI engineering skills, knew nothing about AI, knew nothing about software, I think they would do a great job. Because you can learn all the AI techniques, can learn QA as well. But if you have that coming into the job, Deejay (47:01) You Qamir (47:22) You know, again, you have the detail or the mindset. Deejay (47:24) Cool. And there we go. We're just about on the hour and there is our spicy take. if we need something engaging for social media, I think that there is it. Before we wrap up, is there anything that you'd like to say is, Arisa, Webio, are you hiring? Is there anything that you've got to promote? Haven't written a book lately or anything, you? Qamir (47:41) ⁓ Yeah, thankfully we don't have anything to promote. We're not selling anything other than, you know, what arises sells, which is, you know, really good starter solutions around the whole credit cycle. But I'm not, I don't have a book not yet ⁓ to promote. Maybe next time we do this, I can have some sort of book written. Deejay (47:56) You That would be excellent. Excellent. Right. Well, thanks very much, Camille. And hopefully I'll speak to you again soon. Qamir (48:06) Cool, thank you very much, cheers.

Episode Highlights

Qamar's team built an in-house AI platform to assist agents with sensitive debt collection conversations at scale.

The AI acts as a co-pilot for human agents, using a human in the loop approach to flag vulnerable customers and automate tasks.

They chose to fine-tune their own models in-house to maintain strict compliance and have full control over quality.

Out-of-the-box foundation models were not viable for their use case, showing only 20-30% accuracy on real conversations.

Having a large, high-quality dataset of existing, anonymized conversations was the key to successfully training their models.

The team developed custom internal tools to manage datasets and evaluate model performance because open-source options were not suitable.

Continuous evaluation is critical, as shown when they created a new complaint intent after discovering misclassifications in production data.

AI maintenance is fundamentally different from traditional software because models are non-deterministic and can give different outputs for the same input.

Qamar argues that a detail-oriented QA mindset is the most important skill for an AI engineer to handle ambiguity and find failure points.

Share This Episode

https://re-cinq.com/podcast/building-ai-agents-in-sensitive-financial-enterprises

Listen On

Free Resource

Master the AI Native Transformation

Get the complete 422-page playbook with frameworks, patterns, and real-world strategies from technology leaders building production AI systems.

Get the BookGet the Bookfree-resource

The Community

Stay Connected to the Voices Shaping the Next Wave

Join a community of engineers, founders, and innovators exploring the future of AI-Native systems. Get monthly insights, expert conversations, and frameworks to stay ahead.