Get the Complete AI Native Guide: From Cloud Native to AI Native

Get the Book

Podcast

Oct 8, 2025

AI on Nightmare Difficulty: Secure Code Generation in the US Government

00:00

AI on Nightmare Difficulty: Secure Code Generation in the US Government

secure ai development

government ai

code generation

agentic workflows

claude

In this episode, Deejay talks with Mike Gehard, Director of R&D at Rise8, a custom software development firm on a mission to create 'a future where fewer bad things happen because of bad software.' Mike explains how they navigate the highly regulated 'nightmare difficulty' environment, from adhering to strict NIST standards to handling classified data. Discover their practical approach to security, including running Claude code in isolated containers, leveraging different secure endpoints like AWS GovCloud, and their ambitious goal to enable a single person to take a project from idea to production.

Hosted by

Deejay

Featuring

Mike Gehard

Guest Role & Company

Director of R&D @ Rise8

Guest Socials

Episode Transcript

Daniel Jones (00:03) In this episode, I'm chatting with Mike Gehard, Director of R&D at Rise 8, a custom software development firm on a mission to create a future where fewer bad things happen because of bad software. Mike explains how they navigate the highly regulated US federal government environment, adhering to strict NIST standards and handling classified data. We go through their practical approach to security, including running Claude code in isolated containers, leveraging different secure endpoints like AWS GovCloud, and Rise 8's ambitious goal to use AI coding assistance to enable a single person to take a project from idea to production. Enjoy. Deejay (00:40) So then Mike Gehard you are doing AI code generation stuff in federal governments. You're doing it on nightmare difficulty. Tell us all about that. Mike Gehard (00:49) Yeah. So here at Rise 8, we are trying to integrate AI into two places. So one of the places into our software development life cycle. you know, not a day goes by, you don't see another post on LinkedIn or on the internet somewhere of like the latest and greatest code generation. But also how do we begin to build AI into software we're building for the US federal government? you know, everything from applications that are used at federal agencies, applications you might use from a federal agency. So what would it look like if the IRS had some sort of AI bot that helped you file your taxes? Like that's a thing, but we, because we're dealing with the federal government, we have lots of rules. So NIST, which is the National Institute for Standards and Technology, publishes hundreds of pages of PDFs. that outline how federal government software has to work and we have to adhere to that. So unlike a startup where I spent 20 some years of my career, it's not just sling everything at the internet as fast as you can. This is now has to be measured. We have to know what kind of information we're dealing with, everything from unclassified all the way to classified information. So, you you don't want to be leaking classified information on the internet. So yeah, it's been interesting. It's an interesting problem to solve. Deejay (02:05) And I think it's, you mentioned kind of straight off the bat there. The fact that you've got the two tracks, you've got generative AI in the coding process. How do we make software development more effective and efficient using AI? But then you've got AI feature development and it's, I kind of find it interesting that people throw around the term AI so broadly. And it means so many things in so many different situations. And then us as tech leaders have to try and manage expectations of, like come up with an AI strategy. It's like, okay, well, that's not just one thing. That's lots of different things in different dimensions in your ⁓ line of work. Where are you spending more of your focus and which element do you find more challenging? Mike Gehard (02:45) So we've decided to really push in on the AI for developing software. So we're a consulting company, know, think Pivotal Labs like consulting company, it's called Rise 8. You know, so we focus on that because we have a hundred some consultants who are delivering software for the federal government, Department of Defense, we're at the VA as well. So that for us is an enabler. So if we can figure that out. then the rest of it, software we write, whether or not it has AI embedded in it or not, just goes faster. So, you know, I think we're focusing there right now because any learnings we get from that will immediately translate into putting AI embedded software into production because like if I'm using Claude code, the only difference between my agent on my laptop and an agent I'm going to write for something, let's say the VA who wants to put AI in front of veterans, is the fact that instead of running on my machine generating code, it's talking to a veteran who is trying to get an appointment scheduled or things like that. So all of the learnings we're learning from kind of the controlled experiments in environments we can control, which are our laptops, help us learn what we're going to have to do when we decide to roll out an AI enabled chat bot or an AI enabled assistant to help veterans schedule their appointments. I think that's where we're focusing, but because AI adoption at the federal government is slow. So President Trump has signed multiple executive orders to accelerate AI. Not a day goes by that I don't see an other article of like, the Department of Defense has signed a $200 million contract with I think every major foundation model provider to provide AI services to the Department of Defense. It's happening at the VA, it's happening at the IRS. So we need to get ahead of that and that we just want to control the situation. Like nobody understands how this works. Executive orders are easy to sign. They're really hard to implement. Deejay (04:42) Yeah. And with the kind of the track that's building product features that leverage AI in one way or another is most of the stuff that you're working on at the moment, is that kind of leveraging LLMs and existing foundation models, or are you involved in kind of the more data science machine learning parts of stuff where you're kind of training models on maybe internal proprietary data? Mike Gehard (05:06) No, this is all what I'm calling new AI, which is LLMs. I've actually been trying to separate those because AI has been around for decades. We've had machine learning algorithms that have helped us do things for decades, but this is all LLM based. We have not yet seen a project that would use something like an image classification model that has existed for years. But yeah, all LLM based, all... ⁓ generative AI trying to, you know, English to English translations is I think kind of how I've been saying it. Deejay (05:37) And even that distinction between the more traditional ML based product features and just leveraging an LLM for stuff. the, it's kind of exciting being in on this early, but it's also frustrating because the terminology hasn't really kind of crystallized around these terms. think the Chip Hoyen's O'Reilly book AI engineering kind of encapsulated that. leveraging LLMs to achieve outcomes sort of side of things as opposed to the more machine learning side of things. it's ⁓ trying to explain to people because it involves different skill sets and different disciplines. they're kind of reasonably different things, but of course it all falls under the broad umbrella of AI. Mike Gehard (06:20) Mm-hmm. Yeah, I had a we have a product owner on one of our teams who has done traditional machine learning has done some of that work. What did he say to me one time where it was? Before when we were doing traditional machine learning you had a data science team and you had pipelines and you had to have gigantic data sets and now with LLMs that's taken care of like you literally have a data science team in your pocket now because an LLM can do all that work. No longer are we training models on gigantic data sets. We have this thing that has been trained on a gigantic data set and now can help us do all the rest of the stuff. it's kind of like the Swiss army knife of machine learning. It's like, it can do image generation. It can do image classification. It can do video generation. It can do language translations. It's kind of this thing where you're like, whoa, this is a general purpose tool like the microprocessor. And you, and you prompted in English. That's the craziest part is we're not prompting it in a language that we're all very familiar with. So yeah, it's a, it's, it's a strange world we're living in, in 25 years of technology. I'm not sure other than maybe the internet have I experienced something where it's like, this is powerful and we don't even understand how powerful. it is. We are toying with it at this point. So I think that's a great point to bring up is like, where does this end up? Deejay (07:36) Yeah. Yeah, it's even if all progress halted and LLMs never got any better, the state of AI just reached a plateau. It would be a good decade or two before people start catching up and realizing all the different possible use cases. You mentioned LLMs in general purpose there. And I think that's another pattern that we've maybe started to see emerge that an LLM should be your foundation model should be your first go-to. If you're, you're trying to solve a problem. Mike Gehard (07:54) Yeah. Deejay (08:06) just chuck it at chat, GPT, Gemini, Claude, see how far you get without them needing to, you know, train your own model or go something more specific. Mike Gehard (08:16) Yeah, I mean, you if you look at the handles you can pull, is, you know, we started with prompt engineering and now that's dead because now we're into context engineering, which is really just another mechanism of prompt engineering. Cause the way these things work, it's just a wall of text that you bounce back and forth between the human and the machine. know, but everyone's like, I want to train my own model. was like, I don't think you understand what that means. There are many levers you can pull upstream of training your own model. to do the things you wanted to do. And I've been saying to people, and as hard as it is, most of the time the problem exists between keyboard and chair when you don't get the right answer out of the LLM. It's not the LLM's problem. it's not, to your point, these things are amazing. And we are yet to scratch the surface of what these things can do for us. Most of it is our inability to understand how they work, use them appropriately. And I think a lot of that comes from just We're kind of crappy communicators with other humans. And these LLMs just point out like, this is how bad of a communicator I am, because I assume that you're going to read my mind. I'm assuming you're going to guess what I want. And they just, they don't do that. So it's a really interesting time, I think, not only technology wise, but humanity wise of like a mirror being turned on us of like, what are all the things we don't do well as communicators that are being amplified by these things that Deejay (09:17) you Mike Gehard (09:38) have no mechanism for remembering anything, but are super powerful to process the things that you give them. Deejay (09:46) Are you able to share any kind of like example project? I don't know how much of what you do is like super top secret and not to be shared or whether there are any kind of use cases that you could walk us through in terms of here's a requirement, how you approach that, what would be possible, what the right implementation might be and the kind of added hurdles that exist in federal government that kind of make it harder than just doing in a startup. Mike Gehard (10:09) Yeah, I mean, code generation is kind of where we've pushed all in because it's, my opinion, the most low hanging fruit. Like, you know, not a day goes by where you don't hear somebody talking about Claude code or, you know, agentic how software developers are going to get replaced by AI. So we have been working on a way to securely run Claude code on our laptops. So the problem we're trying to solve is I have a laptop, depending on what project I'm on, I have different levels of sensitive information. So we are allowed to have unclassified information on our laptops. And then this thing called CUI, it's controlled unclassified information. So think PII for the government. Unit rosters are a great example of CUI. Like I don't want the enemy to know how big my forces are. So that's one example of controlled unclassified information. That's all we're allowed to have on our laptops. Things like classified and top secret. They exist in things called skiffs that are secure operating rooms. So when we're on a project, we might have this Cui data floating around on our laptop. So obviously, putting Claude code on my laptop that has full access to my hard drive might be a bad idea, because this thing now has the ability to go search my hard drive. So what we've done and Codcode does this as well. Anthropic recommends this, is you run it in a container. So I am running Codcode in a container on Podman, because Podman runs rootless. So I have another layer of security there. I'm not giving that machine root access. So that's the big project we've been working on, because again, once we solve that problem, now I have an assistant on my laptop running in a secure environment. And That is a lot of our projects. Now we do have one more problem we're trying to solve is, we have kind of divided into three boxes. So as the art director of R &D, I have no sensitive data on my laptop. That is the easiest thing to do. That's like running in a startup. So I still run it in a container. We then have situations where I might have this sensitive data on my laptop, but my code base isn't sensitive. So we do have projects where the code we are writing has national security implications. So that's the hardest one to solve. So the middle one is I have this data on my laptop, but the code I'm writing, which I'm slinging off to the LLM is not sensitive. So again, because I've put it in a container, as long as I'm not mounting a directory into the container that contains this sensitive information, I'm good. So that's probably... I'd say about two thirds of our projects, maybe half of our projects. The last piece now becomes what happens when the code that I'm writing that I'm moving back and forth between the LLM is sensitive. I can't just call it an anthropic endpoint for that. So, you know, those are the situations where I'm having to set up like GovCloud instances on AWS that contain models that have been certified by the federal government. We call them impact levels for the Department of Defense. it's. IL-2 is, I think it's unsensitive and I'm going to get myself into trouble here, but not try to like speak outside of my knowledge. I've been doing this for five months. So I understand just enough to be dangerous. I think IL-2 is unsensitive, like no national security amplifications. IL-4, now we're starting to get into the world where like, Ooh, like if you were to leak this information, like this gets interesting with national security amplifications. IL-5 is, I want to believe. classified and I think IL-6 is top secret. You know, we're talking weapons platforms. Go ahead. Deejay (13:41) When you say I want, when you say I want to what I want to believe classified is that you want to believe that it means classified or as a special classification for just I want to believe UAP type stuff. Mike Gehard (13:53) No, I think, think that means classified. I might be mixing up my IL levels, but like we run in this spectrum. So like when I get to IL-4, which is I have Kui data, this classified classified, no controlled unclassified information. Like now I have to start using separate endpoints. I need a, I need an endpoint that is certified to use that an IL-4 or IL-5 level, which is about half of our projects. So. To loop this all back together, the project we're working on is how do I write one system that every one of my projects can use so they're not making decisions of like, I could use this tool, but if I make a mistake, then I'm leaking, you know, I'm leaking government information out to the internet. So it is a non-trivial solution. So, you know, that's kind of where we focus because again, once I can do that, now I understand what is agentic security. what kind of prompt engineering or prompt injection attacks exist for this world. And I can then translate those into other projects. So that's where we're working because it gives us the biggest boost. Like translating from stories to source code is a huge lift or troubleshooting or proactive monitoring. Like all of this can be done with a machine and people are doing it right now. Deejay (15:07) The kind of solutions that you're exploring there, are any of those general purpose? Is there any ⁓ prospect of a Rise 8 open source framework for kind of switching between different endpoints and classification of data and which tools you're allowed to use? Is that likely to shake out or is it all very bespoke? Mike Gehard (15:25) It's pretty bespoke. I mean, I think there could be some general purpose stuff. We are kind of holding our cards kind of close to the vest just because we see this as a differentiator. So, you know, in an LLM driven world, what is your moat? You know, when we're talking in a SaaS world, the moat becomes the code that I've written. And, to get into the Salesforce business, that's a huge moat. But when you move into an LLM driven world, the moat really becomes really narrow. It's like, well, what prompts am I using? Cause the LLMs are general purpose. So I don't know what we'll do with it. I mean, we're kind of withholding it now cause we're trying to figure out what, what is our secret sauce? Like, is it the prompts we're writing? Is it the tools we're using? You know, I feel comfortable talking about it now because all of this stuff that I'm talking about now, I actually started working on before I got to Rise that goes out and, you know, pretends I'm Kent Beck and does code reviews for me. So I don't know where that'll shake out. I think we're all still trying to figure it out. Deejay (16:23) Yeah, it's ⁓ Mike Gehard (16:23) But none of it's rocket science. none of this, you know, it's a bunch of words that I piece together, I iterate on as I'm using it. You know, it's nothing like, ooh, this is revolutionary. Deejay (16:33) I do wonder whether some reluctance to jump. Some folks seem to be jumping into do AI like with no strategy, no clear idea of outcomes, just spend some budget on AI do something, which doesn't seem like the most sensible idea in the world to me and other folks maybe too far at the other end of the spectrum of not doing anything. And I wonder there may be some amount of wisdom in that because this is all changing so quickly. I wonder how many people are kind of sitting on the sidelines going, I'm going to wait until there's like the Kubernetes of Docker. Like, you you think about the cloud native era and Docker came out and everybody was like Docker, Docker, Docker, and like making their own homebrew platforms. And it wasn't really until cubes came along that then there was like, okay. There's a sensible sort of enterprise grade implementation of this. I wonder how many folks are by the sidelines. But then if you're not experimenting with this stuff, if you're not pioneering and running trials, you're going to end up behind the people that are willing to take those risks and like Rise 8, know, improve the operational efficiency of software development through using these tools. Mike Gehard (17:38) The analogy I've been using lately, it's like back in the day of the web when we were all slinging CGI scripts to do stuff. Like that's where we're at. Like we're in the pre-rails, pre Java spring framework worlds where yeah, everybody, know, and out of day goes by where a new, new agent framework comes out. Like how many agent frameworks do we need? You know, this is a prompt loop. This is not rocket science. So I think there's a mix. mean, so if you look at our journey, Deejay (17:58) you Mike Gehard (18:06) I started with the company April 7th of this year, so I'm coming up on five months. I was brought in as director of R &D mostly to focus on AI adoption because our founder, Brian Kroger, has decided, like many CEOs have, that there's something here. We don't know what it is, but there's something here that's going to help our company. We believe that will help our company be better in the future. So you need almost a spectrum. You know, you kind of get, I got to get on the bus. If you don't get on the bus, you're just standing on the curb. Like you're not going to get ever get anywhere. You know, once you get on the bus, you'd know, you don't know where you're going. So we have found that. You know, you have some people who are going to dive into the deep end and those are the people, you know, the early adopters that you kind of just like give them anthropic credits and see where they end up. You have the other people who are going to be, you know, what I'm calling, uh, you know, AI hesitant for whatever reason. And that's fine. Like you don't need everybody starting their own startup, but the way I've been pitching this to everybody in the company is you are now a startup founder. You have been given budget to go disrupt your own workflow. Simple as that, you know, and some people want to be startup founders and some people don't, and you just don't know what you're going to get out of that. So my job is to help them in their startup journey as they try to. disrupt their own process, which is nice because when you're disrupting your own process, you're the subject matter expert. You don't have to go out and do user interviews. You're like, what problem do I want to solve today? So that's easy for us because we're about 150 people. But if you're a, you know, fortune 500 company, where the heck do you start? And then once you've started, how do you make sure that those learnings are fed back into the machine? So keeping tabs on what experiments people are running and making sure they're running good experiments and moving on from failures. And that just becomes really complicated. So again, like as you said, we're just all trying to figure this out and we'll get somewhere. We'll spend us some money. We'll make a ton of mistakes, but as long as you're learning from those mistakes, like that's the important piece right now. It's not about being right. It's about like, what did I learn? Deejay (20:07) Absolutely. And I think that's one of the reasons that, re-sync the company that I work for and that kind of, ⁓ you know, makes the waves of innovation podcast possible. It's a familiar transformational problem. Like there is a load of change happening. Nobody knows what the right answer is. If you place all on bet, all on red straight off the bat, then you're likely to fail. You don't do anything, then you're guaranteed to fail because you're going to get left behind. So how do you... Exactly like you described, run those experiments and in a large organization, make sure that there are the feedback loops to learn from those. You mentioned AI hesitancy, which I think is a great term. You've got, you know, sort of a hundred, 150 people. Have you experienced much of that given that you've got a CEO who is, you know, all in on AI and wants to assume, communicated that this is a thing that the business is going to be heavily invested in? Mike Gehard (20:58) Yeah, yeah. mean, not just internally, but go on LinkedIn. Humans that I know who are software crafts people are like, no way in hell this is going to work. You're like, like if you want to adopt that, that's great. So I think I read, I listened to a book called A Brief History of Intelligence. It's a, you know, it goes from like single celled organisms all the way up to current human thinking, what we know about the human brain and how it works. like weaving AI into that whole narrative. And in my less than educated, not based in any sort of like psychology background hypothesis is that this is the first time where technology has challenged every one of us to sit back and figure out what it means to be human. The thing that sits between our ears is a biochemical process. You know, we have scientists that have proven that there is electricity that flows through our brains. The more I read about this stuff, the less we understand about how the human brain works, which is kind of scary to me. I assumed we knew more about this. but I think a lot of that comes from, I now have a silicon based alien intelligence that speaks English to me and responds in English, but technically has no soul for your definition of a soul. You know, whatever your individual, I mean, I was raised Roman Catholic. you know, if you forced me to identify a philosophy, it would be like Eastern Buddhism, Hinduism, yogic philosophy based on my life trajectory. It has challenged me to figure out like what makes you and I special from an ape that we share some large percentage of our DNA from. And this is, you know, this is a thing that sits in my computer that is now doing this. So I think a lot of it comes just. Yeah, people are hesitant. Like, what does this mean for me? What does this mean for my belief systems? What does this mean for my livelihood? You know, what does this mean for my family? Not an, you see these articles, horrible articles in the New York Times about people who are are are talking to AI like it's another human and that's their best friend, but it works for them. So like, I think there's a lot of that and we're all being asked to navigate that. individually and then you say, well now at work, have to figure this out. Like I can only imagine how disconcerting that is for people. I pushed all in, like I haven't written a line of code in months, but I have generated more value in, you know, than I could ever have done typing, but I made that choice. That's the choice I made to push all in and see where this lands. Other people are making different choices. Deejay (23:28) I think I see a few signs of some organizations underestimating AI hesitancy in software engineers and kind of assume, ah, the techies, they're going to be like, well enthusiastic about this. But exactly like you mentioned, software crafts, people kind of pushing back on it, going, I quite like writing code. Just like people quite liked writing assembly, you know, 30, 40 years ago. With Rise 8, have you... taking any particular actions or any strategies to kind of find people that may be a more reluctant to engage in the GenAI process in software development and allay their concerns or turn them around. Mike Gehard (24:04) We've been rolling it out pretty slowly. So if you look at it again, at like code generation, which is where we're focused right now is that, so as an R and D team, the team of one, so it's just me, you I'm all in. get, when we have beach, when we have beach consultants, they come to R and D and then the deal is like, you're just going to use AI for everything. So I have some experimentation there. You know, some people are all in, they're like, my God, this is the greatest thing since sliced bread. Others, you know, are a little hesitant. Others don't know what to think about it. We also have an internal product that we're developing. all of those developers are using it. And that's been an interesting one. One of their developers, I would, call an AI skeptic, not hesitant, but just very critical of the technology. and I have slowly over time through one-on-one meetings with him and the team. just like allayed their fears and now they're using it all the time. So I think, you that's one way people can address it is like, you just got to find me people where they're at and one-on-one move them along. We also have been trying to socialize wins that people have had. So when people have a big win, we have an AI channel in Slack, they post that. We've asked them to share their prompts. So how do we start sharing stuff that's working? So. I might be hesitant because I don't have the skills. That is one of the things we've identified. You know, it's scary being a beginner. Like it sucks to learn new stuff. I'm learning government security, hundreds of thousands of pages of government security documents. Like that's hard. So, but we want to allay people's fears of being, of sucking at something by giving them examples of things that are working. And we find some people are like, oh, that's really cool. Let me run with that. And then they're now they're an evangelist. So. Um, you kind of have to meet people where they're at, ask them what their hangups are, and then just try to move them along. And our, like my philosophy is if we can get 25 % of the company just to disrupt our business, we can now just, that's a success. We can just go turn the other 75 % of the company on the tools that we've built. Deejay (26:06) You mentioned sharing wins there. ⁓ there any kind of particular standout success stories, any with big flashy numbers that you can share? Mike Gehard (26:13) ⁓ we're starting to experiment with our first agent that will generate security documentation automatically from like trivia scans and, and GitHub, like SecRel pipeline artifacts. So getting some traction there. but nothing we can say like the saved us 53 % of the time. It's more just got checked stuff because, we just haven't added the rigor. I think we need to at some point we will, but. I mean, my KPI is 10 % improvement for Q3, which is my lagging indicator. My leading indicators, I now have four teams using a container. have three designers who are using Claude code to implement design changes in the code base of their product. like, you know, Not bad leading indicators, I think we'll get some traction there. They're super stoked. So the fact that humans are super stoked, we've got people doing demos about it. those are the kind of wins we're getting is people are excited enough to show their coworkers. Deejay (27:10) It's so many interesting things that I mean, one, the use of AI for ancillary tasks, not just the writing of code, but all of the paperwork that goes along with that when you're working with something like a, like a government, those kinds of wins translate across all areas of a business. I can see so many different use cases and so many use cases that folks just non-technical folks aren't even aware of. Mike Gehard (27:13) Thank Deejay (27:37) Like when we, when we talk to SMEs and non-software organizations, there's almost like this education piece where like they've heard about AI and it's all very exciting. but they don't really realize how much of that repetitive, really burdensome work can be taken away. and then, ⁓ go for it. Mike Gehard (27:52) man, I got a story about that one. So our, our CEOs found our, our CEO's executive assistant has created what we're calling the Brian bot. So she has fed all a bunch of transcripts of all of the talks he has given. Cause as his executive assistant, she is responsible for responding to emails and things of this nature, you know, in a tone. she asked the Brian bot. Deejay (28:02) You Mike Gehard (28:16) to psychoanalyze itself and present a report of like, here's who I am. She handed it to Brian. He read it, he read it and was like, holy cow, this thing sees me or something like that that nature. I was like, I mean, it's really good at pattern matching. We don't realize how much of what we do is a repetitive pattern. So I think there's tons of ancillary stuff that can be done with this. We're working on a bot now. Deejay (28:22) God. Mike Gehard (28:44) So our big thing is outcomes and prod. So prodder, it didn't happen. So, you know, a lot of people will focus on outputs. We focus on outcomes that have mission impact. We're working on a bot that will help us refine our outcome statements. it's like stuff like that where our product practice lead right now is spending hours a week helping humans refine this process. So when the statement comes out, we know exactly what we're doing. We're writing one now and he, you know, we're getting ready to go to production where it'll save him probably a couple hours a week, but you know, but it's, it's totally not related to code generation. It's just simply helping humans refine the way they're thinking about problems. But the Brian bot, the Brian bot was so interesting. Cause I was like, Ooh, that's creepy. Deejay (29:22) going. Yeah. Yeah. I'm not sure I would want to be confronted with a, LLM impersonation of myself and definitely at the psychoanalysis of what it has to say about me. The, ⁓ you mentioned, ⁓ product folks kind of, ⁓ being able to implement, ⁓ UI changes. think you said, how did that change come about? And presumably that then alleviates the amount of, the kind of burden on software developers. Like they have less work to do because product managers can go and self-satisfy requests. Mike Gehard (29:40) and It's more of our design team right now. But yes, our goal is to get product owners doing that. But I think when you look at the traditional design cycle, it's product owners define requirements, user needs, designers make UIs for that. They hand developers Figma diagrams, which then we have to translate into code. But just cut out the Figma diagrams. I we have a designer now who's actually on the tracer project making real time code changes for UI prototypes that they want to build. So now the developers don't have to write the HTML and CSS. You know, what does that save us? That saves us probably a couple hours, three, four hours a week. You scale that over multiple projects. So think, you know, the way I'm looking at this is software development is a transformation pipeline from users needs to running software. And you're just transforming information as it goes down the pipeline. If I can short circuit some of those transformation steps by taking out an artifact that we would generate for humans, but I can use a machine to generate the next output of the pipeline, that's a savings. mean, this is nothing more than a manufacturing system that Toyota has, the Toyota production system. I'm just reducing waste along the way. Deejay (31:09) How did the product designers kind of get in touch with Claw Coat? Was that a suggestion from you? Did that come from the engineers? Mike Gehard (31:12) . That was from us. So, we had a designer rotate into R and D cause they're on the beach. you know, because we have said that Claude code is going to be our first tool, you know, maybe our only tool. know a lot of friends that are giving everybody free reign to use whatever tool they want, but I don't have that Liberty. I have now, if, if it's, if I have to go out and certify six tools for security, that's a huge pain in the butt. So we, we decided on Claude code. because we knew engineers would be our first customer because, know, software development, not a day goes by that like how we're going to revolutionize code generation. This designer was somewhat comfortable with the command line, but not super comfortable with the command line. Just took it upon himself to be like, I'm going to make this work in this code base. And now he is set up office hours where he is having one-on-one sessions with anybody who wants to learn how to use this tool. And he is, as a designer, he collects user research. He's finding that with a couple of tweaks and some re-skilling, we can get people who aren't familiar with the command line to use the command line, plus maybe a Git GUI tool to give them some Git, because it's all happening in the Git repo. That's where all the changes are being made. So based on that, we're hoping we can identify people in the product organization who are CLI friendly and see kind of what that gap is. And then for those that aren't, like we have some product owners that have never seen a terminal window in their lives, we're gonna have to figure out how to either meet them where they're at or move them along the skill ladder such that they feel comfortable in the terminal. But most of it was just him going out and being super stoked about the thing and then just saying I'm willing to sit down with anybody I'll take all covers at this point, you know, so it's really cool Deejay (33:03) I think there's, ⁓ there's like two important things that one is that putting these tools in the hands of not just technical people, not just software engineers is important. And also like I can imagine top down transformation efforts might be inclined to go, right. This group of people are going to use this tool. This group of people are going to use that tool. And it's almost like refactoring a big microservices architecture without redefining any of the interfaces. Like you can make local optimizations, but some of the more profound changes are going to be where responsibilities no longer sit in the silos that they used to. And you can have somebody do a job that then spans kind of responsibility boundaries. So I find that fascinating. And then also the office hours sessions. You mentioned a couple of other ways that you're kind of sharing knowledge. So this product designer is doing office hour sessions kind of showing people how they do what they do. Are there any other kind of structured techniques that you're using for sharing the knowledge that you're learning across the organization as you make these experiments? Mike Gehard (34:10) Yeah. So we have a Slack channel, just like everybody does. I have definitely gotten feedback that the Slack channel can be overwhelming. I used to post in there four or five times a day. Now I'm usually down to about one or two. I think our biggest, our biggest problem right now is giving people the time to go consume the information we need to get them to consume. We're consultants, so we're billable hourly to our customers. So we've done... Slack channel, I do a weekly drum beat. So I wrote a bot that will take the two channels that related AI once a week on Sunday nights. It will summarize every thread in the Slack channel that happened during that week and send me a GitHub issue that says literally copy and paste this into a document that then I share out weekly that says, here are all the conversations that happened. If you couldn't keep up in real time. you know, and this one is interesting to you, go click this link and you can go see the whole thread. What else have we done? We haven't done a whole lot. This is the one I think, like the problem I'm trying to solve because I keep getting feedback that people are struggling to keep up, but I have not gotten good actionable feedback of like, here's how I would like to consume it. I'm still trying to figure that out, but my hypothesis is that I just have to keep putting stuff out into the world. And if some double digit number of humans are consuming this, I'm winning because I don't need everybody to be an AI expert. I just need enough people to be AI experts enough to disrupt their own jobs so that we can build products. So our goal, our goal is to have one riser running a whole project end to end. So what would it look like for one human to your point of collapsing the balance team, what would it look like for one human to be able to do product design and engineering altogether? Think of the cost savings that we could have to the federal government, which is now cost savings to the US taxpayer around that. But I don't need everybody to do that. I just need enough people to help me build the products that are going to help us do that. it is not a nut I've cracked. I'm still struggling to figure it out. And I think I'm winning because every so often somebody else will post something in the channel and says, Hey, look what I did with AI. And as not, as long as I get enough of those, then the flywheel begins to turn. And, know, at some point we will start building products, internal products that help our individuals do the jobs of other people. Not because we want to get rid of those people. but because now we can add more value to our customer, which is the federal government. Deejay (36:44) You mentioned a while back that it's been a good few months since you wrote any lines of code by hand, the old fashioned way. What's your kind of journey been through changing your own development process? mean, for folks that don't know you, one of the reasons why I was quite keen to talk to you is, you you're a software craftsman, right? You came from Pivotal, you know, the people that take their engineering practice very seriously. How have you found that journey and the different approaches that maybe have kind of shaken out over time as you've been working with these tools more and as the tools mature. Mike Gehard (37:16) Yeah, so I was at Pivotal off and on for 13 years, Pivotal Labs, then Pivotal Software, then VMware. I finally left VMware in June of 2023 and took 18 months off just to take 18 months off and work on some personal stuff. And this was, so I was looking to get back into the job market like November, 2024. And, you know, kind of June, 2024 was kind of poking around and I was like, this, this AI thing is going to be a thing. Like, I have no doubt this will change it. And I was like, you as you said, like, I'm pretty opinionated about the way I write software. like test-driven development, mostly because I like sleeping at night. As opposed to like worrying about pushing to production at some point and wondering whether it'll be broke. like my journey was I identified it was going to be a thing. I was unemployed so I could do whatever I wanted to. And I'm like, well, let's see what, what's the hypothesis here? Like my hypothesis is this, if this will be revolutionary to our industry, you know, let's just push all in. What if I'm wrong? So started using, you know, windsurf cursor VS code. as you said, you know, cursor is now kind of good windsurf got bought by somebody. I forget how that all shook out. Claude code didn't even exist back in the day. So. just pushed all in and did some side work for a friend of mine was generating, Android training materials. So I was, I had written some Android, lots of Kotlin. So I was just having it generate code for me. I'm like, this is kind of cool. Like, could I have written this? Yes. Would it have taken me six times as long? Yes. so just started dabbling and then, went to work at rise a. and just kept going was like, you know, there's a, there's like, there's some like hypothesis out there and I forget who, whose name is attached to it, but like a theory goes, what if we assume that God does exist? What is the downside of, know, assuming that God exists? Well, we're all better people. And if God doesn't exist, we're still better people. What if we apply that to like AI, like let's assume that AI will add value. And if it doesn't, So be it, like what's the skin off our backs. So yeah, I just kept pushing and pushing and just kept going like what, like I am not the best software writer. know, people get divided into camps. You either like software because you like writing code. Like I like Ruby, I like the way Ruby feels. You you hear these like, it's the generation of code. I'm very much like. It's a tool that will allow me to add business value. I am not good at writing the thing, but I'm good at trying to use it to add business value. So now I'm rambling. So I'm going to try to tie this back together. I just decided like, don't care if I write code anymore. I don't get joy from writing code. I get joy from adding business value. So if my hypothesis is I can do that faster, let's try it. Perfect example. I am currently working on Terraform code that I have Deejay (39:54) you Mike Gehard (40:09) haven't done in probably seven years. because I'm trying to automate my provisioning of Claude models in a, in a federally controlled AWS GovCloud instance. I haven't written infrastructure and code in forever. I wrote so much code yesterday and I have it working in less than eight hours. That would have taken me days of Googling to figure out how this works. So for me, that's a huge win. Like I have a tutor that will help me do this thing. and by the way, I now have something that works and it's fully auditable. Could I have done that myself? Yes, but it would have taken me days. So I guess to tie it all together, I just said, screw it. I'm all in. I suck at typing anyway. Like, what does this look like? and I don't care if I'm writing Ruby or if I'm writing TypeScript or if I'm writing Python, I just want stuff to work. I just want to add business value. So that's the approach I took, you know, if we tie that into how does it. I still care about quality though. So one of my big learnings that I will give everybody and end my rambling is that I firmly believe that test-driven development inside out test-driven development as we have practiced it is dead. That tool no longer serves me. What I want is outside in test-driven development because I don't care what it does or what it looks like. I care that it works. So I think that's been a big learning like start from the outside Define some like playwright like BDD cucumber Dan North style tests And then once I have that Then like the structure of the code is a little less important to me because guess what? I if I can just have Claude push it around for me a little bit, and I think there's an interesting There'll be an interesting balance of like what? purpose do unit tests serve us going forward in this world of million token context windows that can load, hold directories into context windows. Do we still need good structured software or does that structure go away? I don't know. I'm still trying to figure it out, but I'm drawing a line in the sand. Inside out test room development is less important to me now than outside in test room development. We'll see how many hot... Deejay (42:18) I've. Mike Gehard (42:19) How many hot takes people blast me for that. Deejay (42:21) Well, I mean, I think I'm inclined to agree with you because I had a very similar experience of, ⁓ ran some code recently, say writing some code, watching, cursor, write some code in agentic mode. And I really didn't care what was going in, in the unit test. started off caring, but you know, I was focused on the acceptance test, the outside in stuff. Like does this little CLI that I'm writing do the things that I care about? Does it not do the things that I don't want it to do? Mike Gehard (42:40) the Deejay (42:46) And was like, you know, I don't really care what happens in the unit test. That's for, you know, Claude's on it to worry about. This doesn't really add any value to me. So with your kind of development process now and the way that you go about approaching things, you mentioned running Claude code inside a container. Are there any other? things that you're relying on. you tried any multi-agent workflows? Have you got any kind of slash commands or particular bits of prompting or context that you rely on that you found to be really valuable? Mike Gehard (43:15) so I, I wrote custom slash commands are kind of my jam. So if you look at, you look at the evolution of, of LLM usage, regardless of what you're doing, generating code or whatever, it's like, we started with prompting. It's like, I have a chat bot. I'm just going to like talk to this thing, like another human, which is great. That's a skill. it gets me some, you know, gets me some better results. So. ⁓ That's great, but what I want to do is I want to create these, I'm calling them like LLM functions. They have inputs. There's a black box that does some transformations and then I have outputs and it's like functional programming, although I have non-determinism inside the black box. So I started to move to custom slash commands just because I was like, well, like when I'm troubleshooting, I don't want to have to type out a bunch of stuff. I want to say troubleshoot. Here's the error message. And then I want to go get coffee and come back and see what Claude has figured out. So I started using custom slash commands as a way to, you know, we talked about what the special sauce is. My special sauce is 25 years of experience that I can distill down into a prompt that I can now share with my whole company. So I can now amplify my effect by sharing those. So I started with such custom slash commands. I got enough of those that were good enough. Deejay (44:05) you Mike Gehard (44:28) to create a story ideation to code generation workflow, simply like slash generate story. And that prompt actually asks me questions. So this is another tidbit I'll give folks. If you are not having your AI assistant ask you questions, you are doing it wrong. This is the whole thing of like, well, we're gonna get dumber by using LLMs. It's like, no, like I'm just gonna have it ask me questions. I'm gonna have to it. prompt me for information. So I started with that and I created a whole flow. So I have like, I think it's five or six slash commands where it's like, you know, create story, plan story, cause we know a planning step is important. And in between every one of these, I reset Claude. So I go all the way to story generation. I have some feedback loops of like, Hey, give me critical feedback. The greatest thing about that is, is I now have an MCP server. that I can share across the company. So now we're all doing software development the same way. We're using the same prompts. We're getting the same feedback loops. The other day I wrote, I used one of my custom slash commands in a GitHub action to review pull requests. So now when a GitHub action fails in our repo, Claude kicks in, analyzes the failure, and then I come in in the morning. and say, Oh, look, Claude has analyzed the failure for me. Do I agree with this? So, uh, custom slash commands, MCP server that shares custom prompts. Um, I haven't gotten into the subagents thing yet. I'm a little hesitant because a, I haven't gotten it to work and B I'm worried about vendor lock-in. like, I'd like Claude code, but I don't, you know, as you said, this stuff's changing. I don't want to be locked into something that I'm going to have to go unwind in three months. So. I haven't gotten into subagents yet, but by using custom slash commands, what I'm doing is I'm simulating subagents. But instead of Claude orchestrating those subagents, I'm orchestrating them, which gives me the ability to feed back on those prompts in real time. I've started to plug in some MCP servers, but I have to be very careful with MCP servers because what Simon Wilinson calls the lethal trifecta. So you've got... sensitive data, untrusted content, and an exfiltration mechanism. So in my work, I have sensitive content. I have this data that I can't leak to the internet. So I haven't really got into the MCP server thing, because there's so many vulnerabilities in those things right now by like scraping the internet that I won't get into. That's a whole nother podcast we could talk about. So I'm trying to figure out what a secure way for us to share context is. And I'm kind of experimenting with those, but I have one MCP server that I use and that's just to share prompts. What else am I doing? That's about it. We're all in on cloud code. Like I said, you know, it's one tool that I can, you know, I can't just be, I have to be able to trust it. So yeah, I think that's it. Custom slash commands, MCP server with custom slash commands and you know, something that, that allows me to do web scraping. Deejay (47:19) Have you got any? Have you got any kind of unified set of rules that you roll out between teams or that you're using personally, or is that all kind of encapsulated in the slash commands? I think that one of the rules that another ex-pivot, Bennett, was posting about was threatening Claude with being attacked by bats if it didn't write tests. I thought that was quite a good way of encouraging it to do the right thing. Have you got anything like that in your arsenal? Mike Gehard (47:50) Yeah. I do have a Claw.md file. I've tried to encode most of that in my custom slash commands. Because if you look at the way subagents in Claw code work is, the power of them is they each have their own new context window. So they are prompted by the head agent with a context and they have their own context window because of context window degradation. The Claude.md file doesn't help me in that situation because that context is not shared with the subagent. So I've tried to encapsulate, like my implement agent talks about test-driven development, talks about Kent Beck's tidy first principles. So really that's the only agent that cares about that stuff. doesn't have to be wide. So our Claude code files are, mine are pretty small because I've moved that stuff into the subagents, into the custom slash commands. I also had an interesting talk with one of our engineers the other day where they have a huge Claude.md file and they're using our MCP server that has all the custom slash commands for the software development pipeline in it. And he used one of the custom slash commands. ignored the custom slash demand contradicted what was in their Claude.md file. And he was wondering, he's like, well, why didn't it use my Claude.md file? So was like, well, ask Claude why it didn't follow your orders. And it's, this is the weirdest thing. Cause you're like asking Claude to critique itself. What Claude told him was that because the custom slash command was much more detailed. I thought that would be the better way to go. So think if you had gone to a human and asked you, why did you ignore my instructions? It's like, well, those other instructions were more detailed. So I just, I like did that thing. So, I'm finding that. If we get too prescriptive at too many levels here and we start mixing these metaphors, just, don't get the results we want. So because I want to move to a Gentic software, I want one human managing a bunch of agents. I've been putting all of that context in my custom slash commands and really thinning out my Claude.md files. Deejay (49:50) Cool. Got you. We're getting closer to an hour and I want to be respectful of your time because somebody's got to revolutionize the effectiveness of software development in US government. On the subject of being in federal government and the kind of restrictions that you're under, what would you say to somebody who maybe is working at a bank or a financially regulated enterprise or maybe a health tech company who's like, I can't use AI in my software development process because We're regulators. Mike Gehard (50:17) You have to understand where you're regulated and what, how you can work within that system. So like I said, I have, I work in a highly, highly regulated industry, NIST 853 outlines what I can do. you just have to understand that. And a lot of people don't want to take the time to understand that they just want to say, well, it's too hard and off we go. I mean, if we did that, nothing would happen at the federal level. So it's, it's take the time to understand. what you need to do and then just break down the problem. You know, so if we take my problem with, have Claude code, great. Can I run it on my laptop? No, why not? Okay. Because now it has access to my hard drive. So what's the problem? What's the solution to that? Well, I just put it in a container. Okay. So that's one solution. And now like, what's my next hurdle? What's my next roadblock? So for me, it's secure LLM access. I can't be slinging government information at the internet. So, okay, cool. Like How do I solve that problem? You know, it's system decomposition. It's systems thinking. So, you you have to take that mentality to not just be like, it's too hard. can't do it. Well, what's hard? What's the next hurdle that I can satisfy? And that's, then you get that ball rolling. and then you just have to work, you know, the one thing we're going to struggle with and that I'm learning is that this is still a human endeavor. So we have these humans called authorizing officers. This is a human. who signs on the dotted line that says, I will accept the risk should something happen to this piece of government software running in production. I have learned in my five months that because this is still a human endeavor, those authorizing officials are given some leeway, squishy leeway to say what the rules are, just the way human systems work. So, you know, I've been told stories of... well, we had to go through four authorizing officers before we could actually get our software into production because the first three said, I don't want to deal with this. And the fourth one was able to listen. So sometimes you have to go like find the right people to talk to, to convince them what you're doing is not as scary as they think it is because they don't understand it. So that's the other thing you have to do is you have to educate those humans along the way. says, Hey, here's what I did. Here's what I thought about. Here's how I solve those problems. Do you agree with me? Yes or no? Okay, no. Let's have a discussion about why not. Okay, that's another hurdle for me to solve. So it's, you just have to keep, it's just keep solving problems and not giving up because this stuff can be solved. So if you look at, know, this isn't our DNA at Rise 8. So we were one of the first companies to come up with this idea called continuous authorization to operate. The way ATO works is, every, I think it's three years and I, nobody don't burn me on the details because I'm still learning this stuff. But I think it's every three years, your software is reevaluated as to whether or not it is secure, quote unquote. But you're pushing software every day. Like within that three years, your software could go from compliant to non-compliant a hundred thousand times. So what we did is came up with a way to build in this continuous process to say, every time we push code, we believe we are still compliant and we don't have to wait for three years to figure this out. Nobody was doing this before we did it. Was it hard? Yes. Did we have to fully, deeply understand the problems we were solving? Yes. Did we have to go find a bunch of humans that we could bounce ideas off of to like make this happen? Yes. Had we just said, well, continuous ATO is too hard, we wouldn't have gotten to this point. So you just gotta do the work. Identify the problem, solve the problem, identify the problem, solve the problem, wash, rinse and repeat. Deejay (53:52) It's a great example of the kind of continuous integration, continuous delivery mantra of if it hurts, do it more often until it stops hurting. Mike Gehard (54:00) Yeah. Yeah. And we just applied CI CD, you know, Jez Humble's book to this process that was happening every three years, which is no different than we did with agile software when we were releasing software every seven months. So, you know, it's these problems can be solved. You just have to have the grit to go out and chip away at it, but it forces, it forces you to deeply understand what's going on. And if you don't want that, then that's why you're going to fail. Cause you don't, make a choice to not. understand the process that you're trying to fix. This stuff, it's just, it can be fixed. Deejay (54:33) So in terms of like tangible concrete things, there's understanding the classification of data and of software that you're working with on your local environment, making sure that things are segregated so that you're running, agentic systems, ⁓ in containers. and then when you're calling out foundation models, it sounds like you've got some relationships with the foundation model providers. So there are some use cases where you can just call an anthropic API or an open open a open AI API and throw code at that. But then there are others where you have to use a kind of self hosted LLM in something like AWS. Mike Gehard (55:10) Yes. Yeah. you know, everybody wants the federal government's money. So that's, we're lucky in that way that like they're like, they're willing to work with the federal government to make sure that the federal government can use the LLM platform. So, you know, we are, we're kind of on the heels of that. So Amazon is a big anthropic provider at higher impact levels. Google's got some really good models. can ride higher impact models, impact levels. Deejay (55:16) You Mike Gehard (55:34) But it's just data flow. It's me understanding the system. Like where's my data flowing? This is no different than any other system. You know, if you're working at a bank, you have regulations, are probably, you know, SOX 2 compliant regulations that control where your data can go. You just have to identify those and figure out how you're going to secure them. as verbose as the regulations are, Using an LLM to parse them and help you understand what the actual implementation is, is huge. You know, we have a dream of automating this whole thing. Like just go figure out where my software is non-compliant and then go automate a fix for me. Like that's the end goal. So I think, you know, a lot of people shy away from reading PDFs. I just feed them into, you know, Gemini and ask Gemini to help me figure that out. So it's, that's where you can use LLMs to help with your understanding. Deejay (56:27) Cool. Before we wrap up, is there anything that we can do for you? Is Rise 8 hiring? Is there anything else you'd like to say? Mike Gehard (56:33) We are hiring. We are spinning up some projects with some federal government entities. We've got some AI opportunities coming up. I'm actually in the process of putting together a job description for an AI engineer. So if you're interested, reach out. And yeah, I think the one thing I would tell people is, you again, this in, my somewhat informed 25 year career, like this is going to be, we're in for a big change. And I think, you know, I would just incentivize people to. put aside some preconceived notions of the way things should work and experiment with the way you want them to work and then you know figure it out because I think the world would be a better place. I think we will have more software. We will have better software. It's gonna be scary. We got to figure out the electricity problem. we've like this stuff's not like without its problems, but I think we can figure that out. We just have to keep pushing. Deejay (57:22) All right. Well, it's been a pleasure talking to you, Mike. And hopefully we get to catch up again in a few months and find out how it's been going. Sounds like you're making a lot of progress. So keep up the good fight saving the American taxpayer money. And hopefully we speak again soon. Mike Gehard (57:36) Awesome, thanks. Deejay (57:38) Hopefully you found that interesting and informative. When I was talking to Mike, I accused him of being a software craftsman. And then a few minutes later, he explained that he's maybe more into outcomes than actually the meticulous detail of writing code. And I think I can kind of relate to that in the, you know, I'm not the most algorithmically smart person. Anyone that's about to pair program with me will definitely know that. But one of the things that I was always interested in was doing things the right way to achieve quality outcomes. And that kind of craftsmanship from that point of view, I think is where Mike and I both have something in common. If you enjoyed that, then please share it around with other folks that you think might find it of interest. And if you have any feedback, please email wavesofinnovation@re-cinq.com That's R E dash C I N Q dot com. Otherwise be good to each other and you'll hear me in the next one.

Episode Highlights

Mike Gehard from Rise 8 discusses developing AI solutions for the highly regulated environment of the US federal government.

Rise 8 focuses on two tracks: integrating AI into their software development lifecycle and building AI features into government software.

Operating in this space requires strict adherence to NIST standards, a major difference from typical startup culture.

To maintain security with sensitive data, they run tools like Claude code in an isolated, rootless Podman container.

They handle multiple data classifications, using secure endpoints like AWS GovCloud when dealing with sensitive or classified code.

The team is empowering designers to use Claude code for UI implementation, shortening the traditional development feedback loop.

Mike argues that in an AI-driven world, outside-in, acceptance-test-driven development is more valuable than traditional inside-out TDD.

They use custom slash commands and an MCP server to standardize prompts and workflows, effectively amplifying expert knowledge across the company.

The ultimate vision is to enable a single person to manage an entire project from idea to production by leveraging AI agents.

Share This Episode

https://re-cinq.com/podcast/ai-on-nightmare-difficulty-secure-code-generation-in-the-us-government

Listen On

Free Resource

Master the AI Native Transformation

Get the complete 422-page playbook with frameworks, patterns, and real-world strategies from technology leaders building production AI systems.

Get the BookGet the Bookfree-resource

The Community

Stay Connected to the Voices Shaping the Next Wave

Join a community of engineers, founders, and innovators exploring the future of AI-Native systems. Get monthly insights, expert conversations, and frameworks to stay ahead.