← Previous · All Episodes · Next →
We Have to Talk About Crowdstrike! Hot Takes and Quality Debates Episode 7

We Have to Talk About Crowdstrike! Hot Takes and Quality Debates

· 49:28

|

Richard Bradshaw (00:00)
Hello everybody and welcome to the Vernon Richard show. I am Richard.

Vernon (00:04)
Greetings, I am Vernon.

Richard Bradshaw (00:06)
So firstly we'll start by saying thank you to everyone who has been listening, sharing, commenting. We got some good interaction on YouTube, especially thanks to Ben Doan who commented and then replied as well, which is nice to see. Deb Sherwood was posting, which is amazing about the pod. And there's someone else, but anyway, I will remember. But thank you all for listening, sharing and commenting. We really do appreciate it.

Er, yeah, right. What shall we talk about then? Has anything big happened this week that we're recording this?

Vernon (00:39)
We are recording this in the last week of July. You know, it's the week that the Olympics start. Did anything happen last week, last weekend of any note of any interest?

Richard Bradshaw (00:53)
Nah, there was something about releasing on Fridays and you shouldn't do that and it can cause issues. We are of course referring to CrowdStrike and the outage associated with a slight problem I guess. I'm not going to call it a bug just yet but it is a bug.

Vernon (00:57)
Ha ha haaa!

Indeed.

It's the week after the week before last week, Thursday.

You dear listener, Richard and I, people of our ilk, we all knew what a Kernal was, but our friends and family did not. But as of Thursday, they shot every shit down now. Goodness me.

Richard Bradshaw (01:27)
They do now, yeah. So I don't think we have to explain it. I think our audience will be very much aware of what CrowdStrike is. We should.

Vernon (01:40)
Should, I think we should just in case, just in case. I think we should. Yeah.

Richard Bradshaw (01:45)
Alright then, you want to give it a shot? Seeing as you're volunteering for that by the sounds of it.

Vernon (01:50)
man, I don't know if I understand it. I don't know if I understand it well enough. I'll give it, I'll definitely give it a shot. Here we go. So we have CrowdStrike who are a cybersecurity company. Is that, that fair to say? They had an update that they had made to the kernel of the Microsoft Windows operating system. And this, this update,

Richard Bradshaw (02:02)
Yeah.

Vernon (02:14)
was deployed and released at least as far as UK time is concerned, guess Thursday night, Thursday evening.

Richard Bradshaw (02:20)
Yeah, I saw a comment from a guy in New Zealand said they got it Thursday night. So that whole meme about, you know, don't release on a Friday. It looks like it actually went out Thursday night.

Vernon (02:33)
It went out Thursday night and sadly for Microsoft and particularly CrowdStrike, the kernel update had issues which caused many, many, many, Windows machines to blue screen of death and mini tangent. I realize I haven't seen a blue screen of death in actual years.

So that was interesting. yeah, so what that meant, least in the UK, what that meant for UK folks is...

Becking at an airport became very, very difficult. If you had a doctor's appointment, the medical professionals and practitioners couldn't get access to your notes. If you were doing any banking, that's not happening. In the branch in particular, think things were pretty terrible. And even, even...

The powerhouse of Sky News and Sky, I think they were completely down for several hours. Like there was no Sky News. So you couldn't even go to Sky News to find out why there was no Sky News because there was no Sky News. You had to rely on other news channels, social media and all the rest of it. So this was extremely bad and that was just in the UK. And then it suddenly dawned on all of us because we're each experiencing this in our locales.

Richard Bradshaw (03:33)
Yeah.

Yeah.

Vernon (03:54)
and suddenly dawns on all of us that this is a global issue.

and it's bad, know, the impact is large and that's what, you know, that's, so that's what Rich and I want to talk about. It'd be weird not to talk about anything else. You're probably listening to this because of our release cadence. You're probably listening to this about 50 years after it happened. So hopefully you remember what it is that we're talking about. Yeah, the CrowdStrike released July, 2024.

Richard Bradshaw (04:09)
you

Hahaha

But I think like, based off, yeah.

So given what you were just saying, there's some stats that we can look at now. like Microsoft Analytics claimed that it was 8 .5 million devices, which is obviously a lot.

Vernon (04:37)
Is it though, like how many devices in total could it have affected? That would be an interesting thing to find out.

Richard Bradshaw (04:47)
Well, the vice president of Microsoft said it was 1 % of Windows machines worldwide. But obviously you need to be on a Windows machine. I think it was certain versions and you need to be a CrowdStrike user and accepting CrowdStrike updates. You know, there's a lot of factors that would have resulted in you getting that corrupt file or not. But yeah, I agree with your explanation. I remember waking up and seeing this news and going,

Vernon (04:55)
Imagine that,

Mm.

Richard Bradshaw (05:16)
you know, it's probably not that big and then it kept coming and kept coming and then I was like my usual go -to with things like breaking news go on to Twitter and see, you know, if that's actually people are talking about it there and it was all people were talking about. There's an explanation that I've just mentioned to Vernon before we started recording by a guy called a YouTube channel called Dave's Garage. He's an ex Microsoft developer and he explains it so well, like he taught

Vernon (05:26)
Yep.

Richard Bradshaw (05:45)
things like, see you were talking about blue screens of death, right, for Microsoft. And he was like, and then he went on Linux, it's a black screen, on Apple it's a pink screen. And I was like, I've never seen a pink screen of death. And now I kind of want my, I want my MacBook to crash. Cause I want to see, I want to see it. Touch wood, touch wood, touch wood. But yeah, and then he was explaining. So like you said, kernel, a kernel bug. And then on his video,

Vernon (05:48)
Mm

Have a

no. Why did you say out loud, Rich? Why did you say out loud?

Richard Bradshaw (06:12)
Something that I found quite interesting was he spoke about the process of getting a kernel based application because he explains that most computers have two rings, as he called it. You've got ring one, is ring zero, sorry, which is the kernel. And you've got ring one, which is usually the operating system. And in order to get your software onto the kernel, you have to be approved by Microsoft. And that process, he basically explained it as not being instant. It could

Vernon (06:20)
You heard it.

Richard Bradshaw (06:40)
any amount of time. But the way that some of these tools, the way some of these tools get around it is by getting the driver approved, but then using config files, which basically, this is the bit I found fascinating. The config files basically have the app in them. Like in essence, those config files are the source control of the app.

Vernon (07:01)
Okay

Richard Bradshaw (07:03)
And yeah, obviously that file from what I've heard since basically that file contained just all zeros, which caused the kernel to trip out. And then in this video, which I've mentioned to you as well, which again, I found makes sense in my head, but he was saying as soon as the kernel detects an error, it has no choice but to shut down. Because if memory starts getting erased and rewritten, like you could lose files, operating system, everything.

So he says the Kernal literally has no choice but to halt. Like, it's the safest thing to do. Which in my eye, I'm going, you know, some sort of exception handling, you could do this, you could do that. But it does make sense that you are risking the whole system. So you have to kill it.

Vernon (07:35)
Hmm. Hmm.

Well, guess I'd never, you know, operating system code is not my forte, but I do know that the kernel is the, it's pretty much the lowest level. It's the interface, it's the thing that controls the communication between the software and the hardware, right? So you're,

your operating system communicates with the kernel and the kernel will go off and do stuff, know, I don't a lot memory and, you know, actually send the bits to the printer and receive them back again and all the rest of it, communicate with your network pod or whatever. It actually does all of the heavy lifting. So if that goes to crap, to Rich's point, you got absolutely nothing. Like it's game over.

Richard Bradshaw (08:16)
Yeah, exactly that.

Yeah, yeah.

Vernon (08:36)
Good luck. God bless. We wish you well. So yeah, this thing this

Richard Bradshaw (08:38)
But yeah, watch that video if you're interested because he's very good at explaining things and it's definitely worth a

Vernon (08:49)
They've also released an update on their website CrowdStrike. So we'll put that in the show notes too as of the 20th. I don't know if there's been one after that. Let's have a peek.

Richard Bradshaw (08:58)
Well, there was the news. Yeah. Have you seen the news for today? The four hours ago on the BBC?

Vernon (09:00)
There is,

I have not, no, what does that say?

Richard Bradshaw (09:08)
The headline is CrowdStrike to improve testing after bug causes outage.

Vernon (09:14)
Well, that really brings us to to I guess what we wanted to talk about or, know, discourse, some of the things that we've seen online, et cetera, et cetera, et Is this, you know, cause obviously now once again, in this ever interconnected world and this world where we're dependent on lots and lots of software.

When something goes wrong, software testing is back at the forefront of people's minds in the software business, if it wasn't before. And it's definitely in the forefront of civilians, shall we say, non -nerds, non -software people. They'll be invariably thinking, well, how did this happen? How could

how come this wasn't tested and et cetera, et cetera. mean, I guess we're gonna get into some of these hot takes, right? Because I think we both find them a little bit irritating.

Richard Bradshaw (10:03)
They're not tested one.

Vernon (10:03)
If you're not, let me just, let me not, you know, lead the witness. did you, what did you see on Thursday, Friday and Saturday, And what did you make of what you saw online when it comes to explanations and people's opinions and thoughts on this situation?

Richard Bradshaw (10:20)
It doesn't matter the scale of the outage or any software thing, right? You're always going to get people who claim, you know, that should have been tested. Why wasn't it tested? I would have found that issue. That wouldn't have happened at my company, right? Because we do AB and C and D, right? And it's all, you know, if buts may be in hearsay, right? Because we don't know.

Like we're learning more now, right? And we're recording this a few days after and we're getting a few more details, right? But you don't know what was done. Someone, it's unlikely given the nature of the bug and what I'm hearing about what that file actually contained that, you know, someone found the bug, right? It's clearly, but we don't know how that file is built. Is it actually written by someone or is it generated by software? You know, cause it's kernel level code, right? Has someone really sat there writing that or?

Has it, have they written something much higher level that's been churned out as code? Don't know. But that's the thing, we don't know. So just the argument that I'd always annoys me about was it tested is every single one of the people that were saying that have probably had a bug that they've tried to get fixed or like advocated for, let's use the proper language, right? And someone on the team or a stakeholder more important than you has gone.

No, we're not fixing it. And how do we know that didn't happen, right? Or whatever upstream issue caused it. We don't know. So it it irks me and annoys me that when people say I would have got it and like me, there might be a chance you would. We don't know, but there's not a guarantee and it's not happened because someone hasn't tested. It's happened because the processes in that company are clearly not good enough for the types of risks that they could.

Vernon (11:50)
and

Mm.

Richard Bradshaw (12:09)
create or they've never thought of this risk. It's never come up before. It's never happened. And unfortunately it's come and it's shown its face in a catastrophic way. But you know, it shows a shortcoming of the overall process is not the lack of testing.

Vernon (12:25)
See, even what you said there, I I think really and truly, we're all on the outside looking in. And we might be looking in rather closely because if you worked at a company that was impacted by this, you also had a very tough few days and might still be having a tough time. But ultimately, it could be the process, it could be a whole host of things. think at

At this point seems like a nice moment to read out your LinkedIn post that was on this topic on the day. So I'm gonna read this out and I'll include this in the show notes. So here's Rich's post. So Rich said, it's okay to release on Friday. It's okay to not think of a test idea. It's okay, not ideal, but okay to have production issues. They are inevitable. What's not okay?

is to blame the testing, testers, the people. What's not okay is to say something was tested when it wasn't. What's not okay is not to improve and learn from our mistakes. What's not okay is to not own the problem. Maybe it wasn't tested because they didn't think of such a scenario. Maybe it wasn't tested, the issue was found, excuse me, maybe it was tested, the issue was found,

and management slash team said to ship it anyway. Maybe they misunderstood the risk slash impact of the known issue. Maybe they don't have enough quality gates in place. Maybe it did get released to a subset of users first and no issues were found. Maybe they rolled it out to everyone all at once. A lot of ifs, buts and maybes. So what really matters is how the company responds to this and how its leaders help their people to do better work in the future.

to avoid a similar issue and how they continuously look to improve their ways of work and delivery. How they work with their customers to help them right now and also work with them to improve to avoid similar issues in the future. And that was a pretty good bloody post. I remember seeing that on Friday. Cause the post I put out, first thing I thought about was hookups to people at Microsoft and CrowdStrike because holy shit balls. That must've been horrible.

and all other affected companies who are running on Microsoft systems or who have customers that are running on Microsoft systems, hog ops to all of them as well. But yeah, that post was pretty much hit the mark for me. All speculation, all hot takes. And I don't think they were helpful. I find it particularly, the thing that I find easily the most irritating in these situations is when other testers say worse to the effect of,

Richard Bradshaw (14:50)
Yeah.

Vernon (15:03)
Didn't they test it? I'm like, wow. Wow. If anyone should know that there are multiple reasons and explanations and circumstances that can lead to a bug occurring in production or anyone else, it should be testers. I find it pretty irritating when other testers say worse to that effect, particularly for something.

as hideous and difficult as this. So yeah, I dig your post a lot, man. I liked it a

Who else have you spoke to about this? Cause I had a very brief chat today with Mark Wintering about this and he had a very amusing post that he said he was gonna make. I don't know if he's made it by this

Richard Bradshaw (15:41)
No, go on then, that's it. I've only spoke to people, I only spoke to people at work. So yeah, go on, go for

Vernon (15:46)
Well, hold on, let me dive in, dive into the chat, excuse

Yeah, what we were, because Mark has just written a really, really great post actually. Let me see if I can just get the name of that real quick. Where is it gone? Here we go. So Mark's just written a post on his blog called quality engineering, digital employees and job security. Definitely going to the show notes and you absolutely should go read that thing because it was really, really interesting. And he had, there was one section in there

that pertains to this in particular. It was about...

gosh. Well he, well.

It kind of was and wasn't about Cryostrike.

There's two bits of it that I liked. The first paragraph or the first section is around Gen AI and how it's gonna impact testing. go read the blog to see what that's about. But the bit I wanna talk about now is a section called, we have bigger contributors to job losses. And so Mark and I started talking about how there've been a lot of job losses and people have had a lot of hot takes about why that is.

And they're trying to say, yeah, it's definitely because of gen AI. Like LLMs are causing people to get fired. Definitely that's the reason. 100 % it's outrageous. How dare they, et cetera, et And so he did a bit of, know, admittedly it's not like super duper deep, thorough analysis, but he went and did some, you know, went and looked at some data and he came up with all these really interesting alternative explanations for why these job losses have happened. You know, things

the company going bust or the company needing to return to profit or losing funding or pivoting, et cetera, et cetera, et cetera. Go read the blog posts, excellent. And so I was having a bit of a grumble about people having hot takes about gen AI, but they're positioned them as reality and fact when they're just hot takes.

Richard Bradshaw (17:40)
Yeah.

Vernon (17:40)
And there's nothing wrong with hot takes, but trying to say that your hot take is like some peer reviewed, you know, official opinion is a little bit of a reach. And then he, and then that's when he made his crowd strike remark. Cause I said, yeah, it's just like the crowd strike thing. People are using it as a vehicle to push their already existing agenda rather than look at things as objectively and curiously.

as possible.

And then at some point a little bit later on in the, in our conversation, he said, I was tempted to put a joke video together about CrowdStrike that went along the lines of the fix to this issue is it wouldn't have happened if I was there because there was a lot of, a lot of sanctimonious, well, you should have done this and this would have happened in the old. Well, you definitely shouldn't done this and et cetera, et cetera. We all know what, what you should have done to fit. And you're like, you have no idea. You can't say

Richard Bradshaw (18:34)
Yeah. Yeah.

Vernon (18:36)
You can't say that, you just sound a little bit crazy when you say that.

Richard Bradshaw (18:41)
And I think that ties nicely into like, so that post that I did, right, it got a hell of a lot of impressions, like 210 ,000 impressions, which is absolutely crazy. But the comments were also on fire. And even there, I was having to defend, I say defend, like not defending, like there was hot takes in there, right. That got to me that I had, sometimes I don't respond to comments, but I was like, you know what, like these are getting to

Vernon (18:55)
Mmm.

Richard Bradshaw (19:11)
And again, so I'll go through the common ones that came up. It's not okay to release on Fridays, Richard. It is not okay. You should not ever release on Friday. And that must have been at least 30, 40 people commenting about... yeah, it wasn't just on mine. No, I meant in general, everywhere. And then number one, was released on a Thursday. And

Vernon (19:19)
soda.

It wasn't just on your post. I saw this everywhere. It's crazy.

Yeah.

Richard Bradshaw (19:39)
And then like the other side of it is like someone commented when something like this goes wrong, right, this scale and this is not the first, right? We know there's loads of things that have happened in the past. When it goes wrong, it's not a one day, two day fix, right? It's weeks. So yes, you don't want to be ruining people's weekends and yes, the availability of people will be slimmer if things go wrong, but also

How many releases of CrowdStrike done this year with no issue? I'm sure we could go look at the release notes, right? I bet it's within the hundreds. They were all fine. So again, if you invest in your approaches to quality and testing and good ways of working and you're mitigating the risk that you're able to identify, you release whatever day you want, because there's no guarantee that the fix... If you release on a Tuesday and it's fucked,

Vernon (20:10)
Mm.

Richard Bradshaw (20:32)
You it might not be fixed by Saturday. Like, so what day do you release on? You can't release. You can't release. It's just like you literally, you can never release because each of those out, each of those bugs could last for days and then you're going to ruin someone's weekend. It's just stupid. Like you release when you need to release and there's a thousand reasons to release. There could be critical reasons, legal reasons.

people losing money reasons, right? There could be loads of reasons why you want to ship something. And you should just ship it as long as you've gone through your approach's quality, whatever that is, doesn't matter what it is, whatever the company's approach is, if they've done what they say they'll do and everything's okay as par as their own criteria, ship it. Like the whole Friday thing, it are, and there's people adamantly like defending

Yeah, it's mad.

Vernon (21:27)
I was quite surprised to see the number of people saying that you mustn't release on Friday and the passion with which they were saying it. I didn't realize that the feelings were that strong about it. So where I do agree with people with that statement is

teams and organisations in their context have decided that actually we don't want to release on Friday because reasons.

I think that that's cool. I think that's okay. I think that's acceptable. It's the no, but like just the blanket statement of releasing on Friday is bad all the time. That's the thing I was like, that's, wow. That's what I thought. Cause I think there are some companies and teams who have figured that out and they've tried to make releasing on Friday like releasing on Monday. Like it doesn't matter.

We don't care, it's fine. It's gonna be absolutely chill. But I can see where there is slightly more risk on a Friday depending on the nature of your business, et cetera, et cetera. If you release on a Friday.

Let's say you don't have any PagerDuty, any on -call setup. You're gonna have a busy, maybe your e -commerce and your busiest trading day is Saturday and the office will be empty, et cetera, et cetera. All of that, yeah, okay, released on Friday sounds suboptimal and actually a bit daft. But the blanket statement, I'm not too sure about that one. I'm not too sure about that

Richard Bradshaw (22:56)
Yeah, I agree with you. Like again, like I don't want to put everything under the umbrella of quality, right? But if that's your approach to releasing, that's your strategy. You have people on call. You're, it's part of your ways of working. Your customers are aware that that might happen. Then do it. Like you said, a blanket never release on a Friday. Right? What if all you're building is like a marketing website? You know, that's nothing bad's going to happen. No money exchanges hands, nothing.

What's wrong with shipping that, you know, on a Friday? And then there's also the, like, again, I can, I love the rule of free, right? From, I got it from Jerry Weinberg. I don't know where it originates from, but he introduced it to me. Like, can you think of three reasons why it's good to release on a Friday? And it's like, well, imagine dropping a feature that makes people millions of pounds on a Friday. Imagine opening a new piece of software on a Friday when you're kind of a bit bored at work and it's like, my God, amazing new feature. That's amazing.

I can't wait to use it, right? I can think of reasons why it's good. I can think of reasons why it could go wrong. But again, I think they, can also think of reasons how to mitigate against that. And I can think of reasons why it would annoy me if I was on those teams. So yeah, like you said, the blanket no, just does not make sense to me. think you, yes, there's common problems, but I think as a team, you could easily mitigate those in a decent release approach.

Vernon (23:59)
Mm.

Richard Bradshaw (24:22)
There's comments in there about rolling back, right? Again, I know the maturity that it takes a team to get to, to be able to roll back. But I've also worked in very complex applications where rolling back would be very, very difficult, like borderline impossible, that it's not worth the money to even entertain building it. So.

Vernon (24:44)
Yeah, you can either roll back to a previous version.

deploy a new version, like an actual version with a fix or hot fix it, right? Are they the options? They're probably the options,

Richard Bradshaw (24:57)
Yep. Or it depends on your app, right? It could be an app. You could turn the app off. Obviously, when the crowd strikes, you can't, right? But again, this whole thing about never release, right? If it was, let's say it was a trading platform, right? You could literally have a kill switch and just log everyone out and turn it off. Wow, she fixed it. Like basically stopped the users using it. I'm not saying that's a good approach, right? But like it is...

And trading apps, a bad example, right? But there is examples of software that we could just turn off. Like, you know, if it was doing chaotic stuff, rollback, hotfix, patch

Vernon (25:31)
So have you.

Have you worked on software that low down or embedded software or anything like that? you got an experience of work on that kind of application?

Richard Bradshaw (25:46)
No, unfortunately not.

Vernon (25:48)
We'll get Chris Armstrong on here. I he's done some hardware stuff and some embedded software type stuff.

Richard Bradshaw (25:55)
Now it was always on the list. I've done a little bit of hardware stuff on this little Bluetooth kind of proximity sensor thing that we built at O2. But again, it was, it was no real, what we weren't in, we weren't in control of that embedded software. It was kind of provided to us. But yeah, that was one of the takes that like, you know, blew my mind. And then yeah, you've got the whole.

you know, the, was else that the most common thing that came up was again, not tested it clearly not tested it, right. That, know, we like, where'd you, where'd you start with that one?

Vernon (26:33)
It's clear.

Richard Bradshaw (26:35)
And like,

Yeah, like again, people saying like adamant concrete statement, but in the opposite way, there is definitely risk mitigation steps they could have taken to prevent this.

In hindsight, yes. Like, you know, in hindsight, absolutely. But as a team at that time, we don't know. You don't know. You don't know if they did everything that they were thinking of. Maybe they sat there for weeks going through all the risks they could think of and mitigated every single one of them. You don't know. Like, so in hindsight, yes, we can, I'm sure they, that's what I meant about fixing the process. I, and actually just to go one step back, I'm actually quite impressed with how well

CrowdStrike have owned it. And that's one of the things I put in my comment that they need to own this. Like they did it, they need to own it. And I think that they are now, they have owned it very well and they are talking about, you know, reviewing their ways of working, which you would hope they would lead to improving it. But we just don't know that they weren't doing very good work in the first place and bugs happen. We've all worked in teams where bugs have gone out the door.

It happens. like I said, in that post, it's inevitable. And unfortunately, on the type of technology that CrowdStack is, CrowdStrike, sorry, it's caused, you know, a massive problem, which has led to the CEO being called to Congress. That was in the news today. So the Congress called Delta Airlines to explain why they had so many flights canceled.

And obviously they were canceling flights because of CrowdStrike and then Delta have now called the CEO of CrowdStrike as a witness. So yeah, like literally it's gone to Congress. There's talk about it being costing 5 .4 billion dollars worth of revenue of only which 540 million was insured, they reckon. So like this has cost those companies 4 .5 billion pounds.

Vernon (28:10)
What?

Richard Bradshaw (28:36)
like, which is obviously a stupid amount of money. And the thing as well, like one thing, sorry, I've been talking for a lot of, and I've gone on many tangents, but the other thing that I've not seen anyone explicitly talking about, right, is yes, these blue screens happen. Yes, these delays happen, right. But think about some of the things that would have happened to individuals. Imagine you were flying home for a funeral or a wedding or someone was sick in the family.

Imagine you're getting fined now for not paying your credit card bill on time and now you've got to go through all that hoo -ha of whatever. Imagine all the things that could have happened to individuals. The companies are fine, they'll recover. But what's happened to individuals that were going about their day -to -day life that this caused them to be unable to do? They're not going to get any insurance money. They're not going to be recouped in any way.

Vernon (29:30)
It's,

As ever, it's the human impact that we are talking about here. And to go back to what you were saying before around

what people should have done and the bugs and this, that and the other. It's something that I've spoke about a lot in the past and many, many, many other people, some of our peers, colleagues, there's behavior in the system that you can predict ahead of time. And then there's behavior of the system that emerges over time. And I have no

which one of these two this is, but to presume that it was the predictable kind that could have, I'm talking about now today, you know, what's the date, 24th of June. To presume that we have all the information or even enough information to categorically state this definitely should have been found ahead of time. just don't think it stands up. don't think you

Richard Bradshaw (30:15)
Yeah, yeah, yeah.

Vernon (30:30)
I don't think you can say that and be credible yet. It might be true. I'm not even saying it's not true. I'm just saying to converge on that conclusion now with the amount of information that we've got available.

I am not buying that man. To me that's a reach right now. I mean, you might be right, but you're through luck, not through excellent analysis, I'd say.

Richard Bradshaw (30:50)
Yeah.

And another angle that came up on the Dave's garage video, which again, I think is another take that's kind of being overlooked by people is these channel files, right? That they were uploading are created and designed to mitigate security threats that are happening, right? So they, the way that they release, update the software is, I can't remember. I think I was telling you this offline, so I'm going to repeat it. And if I've repeated it and I've already said it, I'm going to edit it

But Dave was explaining that to get a driver on the kernel level of Windows, have to go through, he gave an acronym, but you have to go through a Microsoft process to get approved. That's not instant, in his words, it's days, weeks. So every time you want to do a release, you have to go through that process, which is not going to work for a security -based tool. Because if there's a day one vulnerability,

Vernon (31:33)
Mm -hmm.

Richard Bradshaw (31:47)
You need protected, right? So they had to come up with a solution and that's what this is. That's what these channel files contain. So what Dave was saying in that video was like, yes, this is bad and cause lots of blue screens, right? But what if they weren't patching that fix? What if hackers or criminals or whatever were able to get in to your systems because you didn't have the latest protection from whatever vulnerability it was CrowdStrike were trying to protect their customers from.

You know, people got in there and did lots of malicious, you know, stuff like that. So again, it's that that's the back to that idea about releasing on Fridays, right? There's another reason to release on a Friday. There's a massive vulnerability been found that is going to threaten literally full running systems and personal data and trillions of pounds. You have to, have to release. You've got to protect your customers. You're given, know, you've got agreements and SLAs. You've got to get

Vernon (32:15)
Jeez.

Richard Bradshaw (32:43)
that, you know, protection out there for that vulnerability as soon as possible. So I think that's interesting fact as well, this whole, you know, how did it go wrong? Yes, it went wrong, but if they aren't releasing these patches, a lot more could go wrong. and, you know, on a scary stale and, know, probably is going wrong every day anyway, right? We all, you know, we're all pretty confident that the government's hacking each other, right? And, you know, there's, there's all sorts of stuff going on. and,

people are trying to protect against this and it's gone wrong. But I've been really impressed so far with their response and I hope that they continue the transparency with how they're going to go about improving. They've spoken about testing specifically, but I would love to see how they're gonna go about doing that and what they actually share. What roles do they hire? What processes do they change? I hope they do share that. And I'd be interested to also hear.

what some of the companies are going to do to protect themselves in the future. Right. Cause that's another consideration here now.

Vernon (33:43)
Well, I'm glad you said that because you just reminded us, I was talking to a friend of mine at work, shout out to Conley, and he was telling me there are companies that he worked for in the past, like law firms, for example. And so my, you know, my wife is a solicitor, she has her own law firm. So this resonated with me. And he was explaining

the company that he worked for.

They had a machine that was set up specifically to take updates of this nature.

So if their systems, I'm doing air quotes for the people listening, not watching. If their systems needed an update of any kind, that machine is the one that would get updated first before everything else. And it would be deployed there and upgraded there and installed there and used on that machine for a full two weeks before any other machine gets that update.

I guess, in mitigation of situations like this. I thought that was really interesting. I thought that was really interesting. That's pretty cool. But again, it's not, know, which was just talking about

What happens if it's some kind of, you know, extreme security risk? Are you really going to wait two weeks to update your system and be vulnerable for two weeks?

Richard Bradshaw (35:11)
Yeah.

Vernon (35:12)
Yes, know, swings and roundabouts. It's interesting.

Richard Bradshaw (35:16)
I can think of two things that relate to that. again, don't know why I'm in Jerry Weinberg mode today, right? But things are the way they are because they got that

Vernon (35:24)
No,

Richard Bradshaw (35:25)
I don't know why I'm on a Jerry Weinberg mission today, but yeah,

Things are the way they are because they got that way, right? So that what you just described there is making me think that company, something like this happened to them, right? Because unless someone thought of that risk quite early on, it's likely that, you know, that exists for a reason. And that makes me think of the second one, which is there's always a reason for a sign. You know, when you see signs that just make no sense, like there's a reason why someone's put that sign up, like something has

Vernon (35:57)
It's just way behind the sign every single time baby.

Richard Bradshaw (36:00)
Exactly. And I think, and I think again here, you know, they've, they've been running for a long time and there's been, I'm not aware of any massive problems, right, that they've had. But they've had a massive problem now. It's unfortunately for them, it's become global news. It's impacted millions of people, you know, if not 10, you know, more than that. And, you know, I hope that they now, as I said on that original post, I hope that they now own it

Vernon (36:17)
Not on

Richard Bradshaw (36:28)
own it a lot going forward in terms of share why it happened, share what they're doing to protect themselves and what they're going to improve in the future. Because there was also another post that I found which is another one to hop onto the the excuse bangwagon but they apparently they they let go a lot of their QE's QA's like a year ago or something like that and they had like they had you know redundancies like a lot of big tech companies did.

Vernon (36:50)
Thank

Richard Bradshaw (36:55)
And apparently a lot of their QAs were laid off. I don't know if all of them have not done the due diligence. I'm just going off a few posts that I saw. But again, is that true? Maybe. Is that a problem? Not really. You don't need QAs or testers to have high quality software. It can help. It can also not help. it's not a reason for the bug.

Vernon (36:55)
And.

And I think my probably default bias position is we should have,

People who are very interested in testing on your team is pretty cool. But the fact of the matter is, it's not a prerequisite for quality products and quality services. It just isn't. Do you know mean? So, yeah. Is it sad that CrowdStrike let all those folks go? Absolutely. It's been pretty grim for the past, I don't know, two, three, four years on the redundancy front, that's for sure.

I think we've all been touched by it in some way, shape or form. But is that the, is that the reason for this happening at CrowdStrike?

I think it's a bit too soon to say. On the owning part, following on from that in terms of explanations, is it the QE's being let go? it this, is it that, the other thing? They have a blog and they've been updating it keenly from what I can tell. In fact, it looks like they've updated it just as we started recording with an executive viewpoint. I don't know what that is. Excuse me, a preliminary post incident view. Content configuration update impacting the Falcon sensor and the Windows operating system.

Richard Bradshaw (38:10)
good.

Vernon (38:26)
in brackets, blue screen of death, B -S -O -D. So I'm gonna go and have a read of that after this podcast is finished.

Richard Bradshaw (38:35)
So, I've got one last question,

Vernon (38:40)
You go.

Richard Bradshaw (38:40)
Is CrowdStrike a quality

Vernon (38:47)
What a question. Bloody hell, now you've just tripled the length of the podcast with one fell swoop. I thought I was going to bed in about half an hour. That's not happening.

Richard Bradshaw (39:00)
The reason I ask before you dive into it is simply right, there's a lot of talk here about obviously testing, right, and obviously the product has caused the problem, right, but when we think about quality and we've spoken a lot about the difference between quality and testing, is CrowdStrike a quality

Vernon (39:19)
Yeah,

Richard Bradshaw (39:28)
I

Vernon (39:46)
Sadly. But on the other hand, you've got to say yes, because to your point, know, it feels like based on what I've seen, it feels like they've been owning it. They haven't been hiding. Put it this

There are good ways and there are bad ways and there are terrible ways of handling situations like this when they occur. So there's the whole set of events and circumstances that lead up to the event.

And we can all practice HDD, hindsight driven development on that to our hearts content. But there's also happens after the event happens. And I think that reaction has been indeed quality. I think what your question and my rambling has made me realize is, do you know on a much smaller scale,

you have an interaction with a company and something goes wrong. Let's say you get to a hotel, your booking has been wrecked, they haven't got a room ready for you, et cetera, et cetera. That's definitely one thing and you will definitely have an opinion about

What happens after that will determine your feeling and your opinion about the hotel or the restaurant or the shop or whatever, right? So if they say unlucky rich, there's a sports shop that might be closing in the next 10 minutes and they sell sleeping bags and tents. You can make your way down there and you can sort yourself out. Good luck, Mr. Butcher. Get out of my hotel lobby or

They could say, Mr. Bradshaw, I'm so sorry. We will, you know, it looks like there's a suite available. We'll get, we'll upgrade you. Don't worry about

paying the difference, it's on us, first night is free, can we carry your bags? We'll have a masseuse waiting for you, because it must be stressful. We'll serve you breakfast in bed tomorrow, because it's been a long day for you. All this kind of stuff. They could make you feel, wow, I hope they mess up my reservation next time, because this has been amazing. And that will change your whole feeling about it. And I think that's where CrowdStrike is right now.

how they handle and react to the issues is gonna determine how people feel about Denim as a company. I'm not an economist, I'm not a finance expert, I'm sure the market has reacted in a predictable way. I suspect it's reacted, but I think going forwards, I reckon it'll be all

Richard Bradshaw (42:11)
yeah, I believe so.

Yeah, because like the thing that made me think of the question, like, so it's a cyber security company, right? You know, to protect you against threats, right? Well, if the computer's not on, there's no threats, right? So that's a quality product, right? You what I'm saying? You know what I'm saying?

Vernon (42:27)
Yeah.

Richard Bradshaw (42:37)
Computer's not online, no threats.

Vernon (42:37)
So you think Tri -Crowdstirke is playing 4D chess? That's what you think is going on here,

Richard Bradshaw (42:42)
But as a product, it's product.

Vernon (42:44)
can't add the queue if the queue is not working.

Richard Bradshaw (42:46)
I'm just putting it out there as a product. It's got to protect you from cyber security threats. It's been doing that. It's doing it. it was doing it. then it went, you know what? I can't cope with all the cyber security threats. I'm just going to turn your computer off to give me more time to deal with it. So, you know, who knows? But yeah, I found that just that whole angle of quality just, you know, I find it really interesting. Like it without a doubt, their people's opinion of them will be tarnished.

If you, I feel it's one of those things where if you take a closer look and I'm not, I've not done this, but if you take a closer look, have they actually been bossing it? Like you said, right? We've not heard of any problems. Clearly a lot of machines use it because you know, 8 million machines. so clearly they've been doing, you know, pretty damn good. but yeah, it's interesting that yeah, how you view and perceive quality. And I know we talk about value to some person, but it

It is interesting now what's going to happen and see what comes off the back of

Vernon (43:49)
I just want to say real quick one more thing. Is

It affected 1 % of the Windows machines out there, 1%. And this was the outcome. I think the fact that anything works at all is absolutely ridiculous.

I don't know whether it's through look or by design or probably, you I would try it with one or the other, that's great. I was having this conversation with my wife actually on the day it happened. I thinking we're all so used to things just working.

Richard Bradshaw (44:08)
Yeah.

Vernon (44:18)
But when it doesn't work, it's an outrage. mean, this is a different category of doesn't work. I get that. But yeah, we're just used to things working. And you think about all the, there's just software all over the place. Getting our car software, we connect our phone software. We get on a train, plane or automobile software. Go to the bank software. We're doing this podcast, more software.

dishwasher software, dryer software. So I mean, crazy.

Richard Bradshaw (44:47)
I think I've said this in the podcast before, right? I remember, and I always need to look up the lady's name, but she was the CEO of Rolls Royce, right? And, you know, one of the finest engineering companies, you know, out there. And they were delivering a talk at Davos where they were explaining that they'd been asked to create an engine for Boeing that had to provide, I don't know, 80 ,000 pounds worth of thrust.

So what they did is they designed an engine that could do a hundred thousand pounds worth of thrust and limited it with software because they knew that Boeing are going to come in like two years time and go, we want 90 ,000 pounds worth of thrust. And they can just tweak the software and go, boom, here you go. And then they basically said that we are no longer an engineering company. We are a software company. We are a software house. And like, it's never been more true. Like you said, software is

Vernon (45:23)
You've have had time to talk before.

Yeah.

Richard Bradshaw (45:43)
Literally everywhere. Like cars don't work anymore without software in them. Like, you know, I've been, don't know why, but I'm hooked on watching the YouTube people who update the cars. Specifically, I watched Matt Armstrong quite a lot and like he hates working on BMWs because everything's software. he just wants it to all be mechanical and yet he's got all these errors and complaints because some software is going, this isn't plugged in, that's not doing this, can't do that, can't start my car.

like and it's just like it's insane how much software is you know in something as you know mechanical as a car yeah it's mad and yeah i think you know just to end you know on a really positive high i i think we're going to enter an era where we're going to see more of this number one a few reasons and this is complete hearsay now but i think there's so much software so much interconnection and

Vernon (46:18)
Yeah.

Richard Bradshaw (46:36)
I'm not going to say testing here, even though people might hear me say testing. I'm explicitly not saying testers or testing. I don't think people are taking quality seriously enough. across everywhere. I don't think there's enough people taking it seriously. I don't think there's enough programs in place to train people to take it seriously. I don't think there's enough tech, IT people to enable companies to take it as serious as they want. I don't think there's enough time or money.

Vernon (46:42)
Mm

Richard Bradshaw (47:04)
or the system's not designed to have high quality products. I think there's so many things in it, but I can't, I don't think this is going to be. And also the other thing is it's not the last. This happens every day, right? Just not at the scale. And I think this scale is because you, when it wants to impact individuals directly and a lot of them, it makes the news, right? Because suddenly there's a lot of people tweeting, posting about, you know, being stuck here, there and everywhere.

Vernon (47:07)
in

Richard Bradshaw (47:34)
But AWS went down, didn't it, for eight hours a couple of weeks ago or three, four weeks ago, right? Didn't bring whole systems down, but brought large parts of them down. And obviously, slightly different. You can fall back to a different cloud and there's other ways of coping it. But these bugs are happening more more frequently. It's in my opinion, this is my no data analysis. But I honestly do think

Vernon (47:38)
Mm. Mm.

This is your take.

Richard Bradshaw (48:03)
There was a few comments about it and I don't want it, but like people were talking about standards for certain types of software. But I replied saying, yeah, how do you categorize it though? So, know, like certain parts of a piece of software might be critical, right? So do you portion that part off as being the standard? And then the other bit, you know, like crowd strikes, I don't know, help guides, right? Are they crashed as critical? Do they have to go through a four week testing?

quality gateway, gatekeeper of malarkey, right? Or do only certain changes over here need to go through that, right? So I get the concept and I get why someone would suggest it, but implementing something like that is so complex that borderline impossible. So, yeah.

Vernon (48:34)
EEEE

and that's

Yes, hard man.

Yeah, it's super hard that.

Now that was a good topic. Please, please, please give us your hot takes and opinions and views on this CrowdStrike situation where you impacted, were you caught up in the...

the stress of having to fix things for your systems or your customer systems, anything like that. Hope you've come out the other side. Share your war stories. We'll share some hookups with you, for

Richard Bradshaw (49:17)
Absolutely. All right. So that's a goodbye from

Vernon (49:23)
And it's goodbye for me.

Richard Bradshaw (49:26)
Cheers

View episode details


Subscribe

Listen to The Vernon Richard Show using one of many popular podcasting apps or directories.

Apple Podcasts Spotify Overcast Pocket Casts YouTube
← Previous · All Episodes · Next →