Check out our companion blog!
Nov. 18, 2024

Disaster Recovery Testing: Start Small Before Going Big

Disaster Recovery Testing: Start Small Before Going Big

Ready to level up your disaster recovery testing game? This episode covers everything from basic restore testing to full-scale DR scenarios. Curtis and Prasanna share real-world experiences and practical advice for implementing effective disaster recovery testing strategies.

Learn why starting small is crucial, how to define clear success criteria, and ways to test without risking your production environment. We discuss different infrastructure types, from physical servers to cloud platforms, and explain how each requires its own testing approach. Plus, get insights on creating effective runbooks and ensuring your team can execute recovery procedures without depending on specific individuals.

Whether you're planning your first DR test or looking to improve existing procedures, this episode provides actionable guidance for building confidence in your recovery capabilities.

BTW if you want to watch/listen to the Alaska DR story, I'm actually going to repost it next week.

Transcript
Speaker:

You found the backup wrap up your go-to podcast for all things



Speaker:

backup recovery and cyber recovery.



Speaker:

In this episode, we jump into disaster recovery testing, and trust me, you don't



Speaker:

wanna learn these lessons the hard way.



Speaker:

I've got some wild stories about DR.



Speaker:

Tests gone wrong.



Speaker:

Including one from my early days at a bank that'll make you cringe.



Speaker:

My co-host persona, and I break down exactly how to approach DR.



Speaker:

Testing the right way, starting with the basics and working your way up.



Speaker:

We'll tell you why non-destructive testing is absolutely critical.



Speaker:

Seriously, you don't wanna blow up your production environment just to test Dr.



Speaker:

And how to set realistic success criteria that won't make you cry Another episode



Speaker:

from The Lessons From the Trenches.



Speaker:

I hope you like it.



Speaker:

By the way, if you don't know who I am, you're a first time listener.



Speaker:

I'm w Curtis Preston, AKA, Mr.



Speaker:

Backup, and I've been passionate about backup and recovery for over 30 years.



Speaker:

Ever since.



Speaker:

I had to tell my boss that we had no backups of the production



Speaker:

database that we had just lost.



Speaker:

I don't want that to happen to you.



Speaker:

That's why I do this podcast.



Speaker:

On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.



Speaker:

This is the backup wrap up.



Speaker:

Welcome to the show.



Speaker:

Hi, I am your host, w Curtis Preston, AKA, Mr.



Speaker:

Backup, and if you could take just a quick moment to subscribe



Speaker:

or follow so that you'll get our great content, that would be great.



Speaker:

I am sitting here with none other than the guy who's very concerned.



Speaker:

About my obsession with serial killers lately, or at least one serial killer.



Speaker:

How's it going?



Speaker:

How's it going?



Speaker:

Persona?



Speaker:

I am good, Curtis.



Speaker:

Yeah,



Speaker:

about me?



Speaker:

so I think after this you should watch Hannibal,



Speaker:

the TV show, not the movie.



Speaker:

They both are good, but



Speaker:

Right?



Speaker:

Right?



Speaker:

And then we could discuss about whether people taste differently



Speaker:

depending on what they eat.



Speaker:

that's always something you talk about.



Speaker:

Uh,



Speaker:

your latest obsession is Dexter.



Speaker:

Yeah.



Speaker:

Which, which, for the record, I'm rewatching, right?



Speaker:

I, I enjoyed Dexter when it was on and when I had to wait a week, right?



Speaker:

going back to those old days



Speaker:

oh Lord,



Speaker:

I, you know, I'm, I'm kind.



Speaker:

I think you're, aren't you?



Speaker:

Like, don't you watch you, you do not watch shows when they're on.



Speaker:

You wait until they're done and then you binge them,



Speaker:

right?



Speaker:

Yeah, for



Speaker:

the most part, although these days I don't actually have like broadcast tv,



Speaker:

Right.



Speaker:

it's whatever's available on Netflix or Amazon or Take your pick.



Speaker:

We, we, we actually have YouTube tv, so we have broadcasts and there's



Speaker:

some shows that we watch on there.



Speaker:

Um, but there shows where, like, you know, this episode, that episode,



Speaker:

they're, they're all the same, you know?



Speaker:

And



Speaker:

Yeah.



Speaker:

you don't, it's not like, uh, it's not like



Speaker:

Dexter where there's an ongoing



Speaker:

so here's the funny thing, right?



Speaker:

So it was, it started off first on.



Speaker:

tv.



Speaker:

Right.



Speaker:

And like you said, you had to wait a



Speaker:

Showtime.



Speaker:

Yeah.



Speaker:

And then what I used to do, right, so I didn't have Showtime,



Speaker:

Mm-Hmm.



Speaker:

so I had to wait for it to come out on Netflix DVDs,



Speaker:

Oh,



Speaker:

right.



Speaker:

I would request the DVDs.



Speaker:

And of course I was cheap, frugal, however you want to say it, right?



Speaker:

And so I had like the two DVD plan, so I'd request like two DVDs, right?



Speaker:

So you'd get like six episodes.



Speaker:

Yeah.



Speaker:

Five or six episodes, you'd binge watch those and then you'd send them back.



Speaker:

And then you'd have to wait a week to then get the next set of DVDs



Speaker:

so funny.



Speaker:

in



Speaker:

order to watch it.



Speaker:

And that's I think, how I ended up watching Dexter.



Speaker:

And I think Breaking Bad might've been the same way too.



Speaker:

Yeah, you're just, you're cheaper than me.



Speaker:

Dexter is currently on Netflix,



Speaker:

so, um, yeah.



Speaker:

Anyway, so, and you could do, you can watch all these things while you do.



Speaker:

Disaster recovery testing.



Speaker:

'cause there's a lot of time, there's a lot of time when you do DR testing,



Speaker:

there's a lot of downtime, right?



Speaker:

You sit there and you stare at the screen.



Speaker:

Um, and, um, I'm gonna, I'm



Speaker:

gonna, I'm gonna start out a story.



Speaker:

What's that?



Speaker:

but can you really?



Speaker:

What,



Speaker:

I understand like if you're watching Dexter or pick one of these



Speaker:

very, the shows that pull you in,



Speaker:

uh huh.



Speaker:

Would you actually be focused or would your DR testing basically balloon like



Speaker:

10 times the normal amount of time?



Speaker:

Because you're



Speaker:

like, oh yeah,



Speaker:

I forgot to get back to that.



Speaker:

well, it de, you know what, it's gonna depend on the type of DR test



Speaker:

you do because some DR tests, there's a lot of waiting, there's a lot of,



Speaker:

I'm gonna start the restore part, and then I sit there for many, many hours.



Speaker:

And if you got one of those, then.



Speaker:

Doesn't really matter whether you're focused or not, as long



Speaker:

as somebody's keeping an eye on the, uh, percentage done.



Speaker:

And I'm gonna start this with a story from back in the day.



Speaker:

The first backup, uh, the first restore test I ever did.



Speaker:

The first, like, well, at least the first one I can really remember.



Speaker:

And this was when we had, um, you know, I was at the bank.



Speaker:

And which MBNA, which at the time was the second largest credit card corporation,



Speaker:

and I was in charge of backups.



Speaker:

And we had, I had talked the boss into moving to what would become



Speaker:

the first of many commercial backup products that I had, uh, used.



Speaker:

And that product was a product called SMarch, which as I've mentioned



Speaker:

before, should have been called SM Back because it was not an archive



Speaker:

product, it was a backup product.



Speaker:

Well, they were out, they were out of, uh, Minnesota area and,



Speaker:

and we had converted to them, but I like a good.



Speaker:

Good little backup guy.



Speaker:

I had done like a parallel implementation and so I was still



Speaker:

running my old dump tapes and I was running this new, uh, fancy tool.



Speaker:

And one of the things that this tool had was built-in, uh, compression.



Speaker:

Um, I wasn't using the compression on the tape drives.



Speaker:

I was using compression in the



Speaker:

software,



Speaker:

uh, to compress, uh, to go to the tape drives.



Speaker:

So we had our first major.



Speaker:

Failure, uh, file server, HP FS oh one.



Speaker:

I still remember the name of the server



Speaker:

just like get burned into your



Speaker:

Yeah.



Speaker:

Yeah.



Speaker:

And, and this is related to another story that we, to a friend of mine that,



Speaker:

that I've referred to before where she was a consultant and she accidentally



Speaker:

basically re, this was a self-inflicted disaster where a consultant was trying



Speaker:

to clean up home directories and she just did a really, really good job



Speaker:

of cleaning up all the home directors.



Speaker:

So I, I put my, uh, SMR tapes.



Speaker:

Uh, you know, in my front pocket and I put my backup tapes in my back, my



Speaker:

back pocket, and I went down there like Mighty Mouse here comes said the day.



Speaker:

And um, some of you will get



Speaker:

Not available on iTunes, I



Speaker:

Not available at iTunes.



Speaker:

And we went in there and I put, these were DDS, uh, tapes, right?



Speaker:

And these, 'cause this, we were all hp, HP loved DDS.



Speaker:

And so I popped into DDS tape, I pulled up my SMR software



Speaker:

and I started the restore test.



Speaker:

And, well, it wasn't a restore test.



Speaker:

It was, it was, I was testing, I was testing in anger, uh, you know, we



Speaker:

were testing like this was for real.



Speaker:

And, and I had not done any actual testing.



Speaker:

I.



Speaker:

And so I kicked off the Restore and um, it.



Speaker:

I'm, I'm watching and I, I, I, you know, like a good little Unix boy.



Speaker:

I had a, for a while loop



Speaker:

running and it was doing a, a DF on the, on the, um, on the, you know, to



Speaker:

display the size of the file system.



Speaker:

And I'm watching, and like a long time was going by and there



Speaker:

was no change in the size of



Speaker:

the file system.



Speaker:

And I was like, this is weird.



Speaker:

And then I went over and I looked, I just.



Speaker:

Just outta curiosity, I looked at the



Speaker:

tape drive, right.



Speaker:

You know, it's kind like, it's kinda like your car dies, you know, you



Speaker:

open up the hood like, I have any idea what's going on in inside there.



Speaker:

You know, look in there



Speaker:

and I see the, I see the light on the tape drive and, and it



Speaker:

goes, B blink, B blink, blink,



Speaker:

one 1002, 1003, 1004,



Speaker:

blink, blink.



Speaker:

Right?



Speaker:

And there were these giant pauses in between.



Speaker:

The blinks.



Speaker:

And so I'm like, that's strange.



Speaker:

It always blinks when it's, you know, when it's reading or writing data, right?



Speaker:

So I called up, I called up the guys and I said, Hey,



Speaker:

The



Speaker:

vendor.



Speaker:

Yeah,



Speaker:

vendor.



Speaker:

Yeah.



Speaker:

Called it the vendor.



Speaker:

What's going on here?



Speaker:

They said, well, by any chance did you use the compression feature?



Speaker:

And I said yes.



Speaker:

Yes I did.



Speaker:

They go, yeah.



Speaker:

So let us explain to you how the compression feature works.



Speaker:

So when we're backing up files, um, basically, um, we do the equivalent



Speaker:

of compress minus CI think it was, to, to send the result to standard out



Speaker:

and it send it straight to the tape.



Speaker:

But we don't know how to do that on the way back in.



Speaker:

And so what we do is.



Speaker:

We, uh, we, we read the tape, we read the entire file that we're gonna restore



Speaker:

into, and we restore it to temp, and then we run uncompress in place in



Speaker:

temp, and then we move the res, the



Speaker:

uncompressed restored file.



Speaker:

To where it's gonna go.



Speaker:

And we do that one file at a



Speaker:

Oh,



Speaker:

because we're concerned that we're gonna fill up temp.



Speaker:

And I'm like, oh.



Speaker:

So like if temp is, if I have a single file that's bigger than



Speaker:

temp, it's just not gonna work.



Speaker:

They're like, yeah.



Speaker:

And they're like, if this is a concern, we suggest perhaps



Speaker:

you don't use this feature.



Speaker:

it would've been helpful to know before I needed



Speaker:

Right, right.



Speaker:

And so thank God I had the, I had the backup tape in my backup



Speaker:

pocket and I pulled 'em out.



Speaker:

I was like, okay, we're just using, you know,



Speaker:

uh, dump here.



Speaker:

And, um, and we restored and everything was basic, right?



Speaker:

Um, and luckily I had backups from the previous night.



Speaker:

And really the moral of this story is.



Speaker:

Don't do that.



Speaker:

Right.



Speaker:

Don't test your backups before



Speaker:

you need them.



Speaker:

Right.



Speaker:

Do a DR test and that's what we're talking about today.



Speaker:

Do a DR test before you actually need to do DR and, and thi this May, we



Speaker:

will see I the, you know, sometimes when we have these episodes, persona,



Speaker:

you know, you know, we have these conversations beforehand and I'm



Speaker:

like, I don't know if we can fill up an entire episode over this topic.



Speaker:

We said that I think over the last one.



Speaker:

Yeah,



Speaker:

And it ended up not being a problem at all.



Speaker:

This is not one of those episodes.



Speaker:

This is one of those episodes where I think we might go along and we'll end



Speaker:

up turning this into two episodes, um, because there's a lot to talk about here



Speaker:

because before we do any testing, right, um, what do you think we need to do?



Speaker:

Well, you, well, you need to understand,



Speaker:

there.



Speaker:

By the way, there's probably no wrong answer here.



Speaker:

Uh, but



Speaker:

well, well, I think before you could



Speaker:

un unless your answer is, unless your answer is nothing,



Speaker:

um,



Speaker:

no, no.



Speaker:

So I was, I was going through my head.



Speaker:

I was thinking even before you get to testing,



Speaker:

Yeah.



Speaker:

you need to understand what are the business requirements for how quickly



Speaker:

you need to bring up that site, which in which actually you don't



Speaker:

necessarily think about as testing, but that's actually part of your



Speaker:

backup system design before you even get to the testing part.



Speaker:

Yeah.



Speaker:

So we, we have to agree on what success would be, right?



Speaker:

And we have to agree on obviously what we're gonna test, but we have to



Speaker:

agree on the parameters of that test.



Speaker:

And so, um.



Speaker:

What?



Speaker:

What do you think are going what?



Speaker:

Go ahead.



Speaker:

You and I was just going to ask about like your parameters, right?



Speaker:

What are



Speaker:

those parameters in my mind?



Speaker:

Some of those are how quickly do I need to be able to bring up what the scope is,



Speaker:

right.



Speaker:

Am I failing over and



Speaker:

trying to recover a file, an application, a data center, right?



Speaker:

What am I looking to actually test?



Speaker:

So it depends what you're trying to test.



Speaker:

And what the extent is that you want to test.



Speaker:

And also, I think the other thing is how close do you want to get to testing



Speaker:

an actual disaster as part of that?



Speaker:

Right.



Speaker:

Um, and I, I would say that there are different kinds of DR tests and, um.



Speaker:

The, there's, and, and if you haven't done any, I would say let's start small, right?



Speaker:

If we've never, if, if we've never done a DR test of any kind, I would



Speaker:

probably start with, what are you



Speaker:

I was thinking about the test you don't wanna try is who is the guy in Alaska?



Speaker:

Oh yeah,



Speaker:

yeah.



Speaker:

Okay.



Speaker:

We definitely have to put a link to that episode.



Speaker:

That was the most amazing episode ever.



Speaker:

In fact, you know what?



Speaker:

We'll pro we'll probably replay it over, uh, over the holiday break, Uh,



Speaker:

by the way, we love him and it, and, and it had a happy ending, but oh my God.



Speaker:

What, what a



Speaker:

nightmare,



Speaker:

think he was rebuilding a raid array, if I recall.



Speaker:

He was swapping the disks around



Speaker:

it was self-inflicted in that he said, I want to move the discs around.



Speaker:

Because they were like.



Speaker:

different sizes I



Speaker:

Yeah.



Speaker:

Yeah.



Speaker:

Like, and he's like, the only way I can do this is to just, to just



Speaker:

wipe everything and start over.



Speaker:

And he is like, yeah, okay.



Speaker:

So that's what I'll do.



Speaker:

I'll just wipe everything and then I'll restore everything.



Speaker:

My s my backup system works.



Speaker:

And this is how he tested his backup system.



Speaker:

Please do.



Speaker:

He definitely learned some things along the way, but, um, yeah, so don't do



Speaker:

Don't do that.



Speaker:

Yeah,



Speaker:

Don't do that.



Speaker:

Uh, but



Speaker:

learn from his example.



Speaker:

Listen to that episode and, and,



Speaker:

and have



Speaker:

a heart attack as you're listening.



Speaker:

because the thing that ran through my head was someone who's never done DR



Speaker:

testing would've been like, yeah, I'm just gonna shoot in the head my production



Speaker:

site with all the applications, and I make sure that it, or just test it out



Speaker:

and see does it fail over properly?



Speaker:

Yes.



Speaker:

The key here is non-destructive DR testing.



Speaker:

Right?



Speaker:

Um, and so, yeah, so, so to go back to like setting the scope, I, if this



Speaker:

is your first time doing DR testing, I would set the scope as small as possible.



Speaker:

In fact, if you've never done DR testing, I would not even be doing DR testing.



Speaker:

I would just be doing restore testing



Speaker:

and



Speaker:

What's the difference, Curtis?



Speaker:

What's that?



Speaker:

What's the



Speaker:

the question is, the question is are we going to declare a disaster



Speaker:

and say that this site is down in some way, um, and then we're



Speaker:

gonna fail over to another site?



Speaker:

Or are we simply just going to bring another server online?



Speaker:

Right.



Speaker:

De depending on how you define, uh, some, and, you know, depending on the



Speaker:

day of the week and the day of the year in which year it is, I have called.



Speaker:

Every time you have to restore, uh, a server of any kind, a disaster.



Speaker:

It's just a disaster of different levels, right?



Speaker:

So if the server, if a server caught on fire, that's a disaster,



Speaker:

right?



Speaker:

Um, and, or, or if a server just died, right?



Speaker:

Or if, like the, the story in the beginning, I think that was a disaster



Speaker:

because it took out a production workload,



Speaker:

right?



Speaker:

So I would say perhaps start with a single.



Speaker:

Production workload and restore it.



Speaker:

First off, just restore it, you know, like in place, not, not in place, not



Speaker:

in place, I meant like within the data



Speaker:

center or within whatever computing environment you're using.



Speaker:

And then start talking about VPNs and



Speaker:

you know, that sort of stuff.



Speaker:

So, uh, that would be defining the scope as small as you can



Speaker:

for the first test that you can.



Speaker:

so maybe like, uh, like maybe it might be a directory within a file server.



Speaker:

If you've never done



Speaker:

any restore testing, yes.



Speaker:

A simple directory within a file server, does our backup system work at all?



Speaker:

Right.



Speaker:

Um, and then the next, the next is going to be some type of



Speaker:

recovery of an entire server.



Speaker:

Hopefully you have that server virtualized because doing that is going to be,



Speaker:

uh, obviously much easier assuming you're treating that VM as a vm.



Speaker:

Uh, if it's not virtualized, then we we're gonna start going down the



Speaker:

level of, um, bare metal recovery,



Speaker:

right?



Speaker:

BMR By the way, we should probably do an episode on that.



Speaker:

I



Speaker:

didn't think about that.



Speaker:

We should do an episode on that.



Speaker:

I know you keep mentioning server as you're talking about this.



Speaker:

Would you also qualify that that could be server or an application?



Speaker:

thanks for bringing that up.



Speaker:

That's why you keep me around, you know?



Speaker:

So it depends on what you mean by application, right?



Speaker:

Um.



Speaker:

You could, these are like the different



Speaker:

level.



Speaker:

You talk about restoring a directory.



Speaker:

I would also look at restoring a single database within a server, right?



Speaker:

If that's what you mean by application, then I would say yes.



Speaker:

Sometimes when we say application, actually a lot of times when we



Speaker:

say application, what we mean is



Speaker:

application



Speaker:

Multiple



Speaker:

on multiple servers and multiple things, all interconnect.



Speaker:

And if that's what you're talking about, I'm gonna say no



Speaker:

yeah.



Speaker:

yeah, I



Speaker:

was thinking like a, my,



Speaker:

time out.



Speaker:

yeah, I was thinking more like restore your MySQL database.



Speaker:

Exactly



Speaker:

right.



Speaker:

Um,



Speaker:

table within your MySQL database.



Speaker:

Right.



Speaker:

yeah.



Speaker:

Um, and again, non-destructively,



Speaker:

Yeah,



Speaker:

right.



Speaker:

Um, I.



Speaker:

Uh, just, just a, just an entity and you can try all of these things, right?



Speaker:

You can try restoring a database.



Speaker:

You can try restoring an entire, all of the, all of the databases on a server.



Speaker:

You can try restoring a server with the databases, um, and or any other



Speaker:

applications that might like a web server.



Speaker:

You can try all of these things individually and make sure that



Speaker:

you've got those pieces down.



Speaker:

That this is about defining the scope.



Speaker:

Try the all of the different pieces first and make sure you've got the,



Speaker:

the recovery path for each of those parts of your infrastructure down



Speaker:

before you decide, okay, we're gonna pretend we're gonna blow up the data



Speaker:

center.



Speaker:

and the other reason to bring that up, it's important to test these different



Speaker:

types of components because there are gonna be different nuances like how



Speaker:

you deal with databases and recovering.



Speaker:

That is definitely gonna be different than file servers, which is probably



Speaker:

also gonna be different than servers.



Speaker:

And so it's important to understand the nuances of what is possible and



Speaker:

the steps for each one of these.



Speaker:

And also the different backup systems.



Speaker:

So if you've got.



Speaker:

Physical, you know, you've got physical servers that are not virtualized.



Speaker:

You've got servers running in Hyper V or VMware or you know, any, any, any sort



Speaker:

of on-premises virtualization set.



Speaker:

What was the third one



Speaker:

Broadcom,



Speaker:

brought?



Speaker:

VMware,



Speaker:

It's, it will always be



Speaker:

VMware to me.



Speaker:

It will always be world, um, the, um, if you've got.



Speaker:

VMs running in, uh, AWS, GCP Azure.



Speaker:

If you've got basically all of the different places that you



Speaker:

have infrastructure, probably have different backup and recovery



Speaker:

methodologies for each of them.



Speaker:

And so you should also be looking at testing all of those.



Speaker:

And you should be looking at testing each of them individually as a,



Speaker:

an overall process that you're working towards developing, uh,



Speaker:

declaring a much bigger disaster.



Speaker:

Yeah.



Speaker:

I, I think it's important also to be familiar because sometimes you might have



Speaker:

a real disaster that doesn't require you



Speaker:

to recover every single component within that higher level application, right?



Speaker:

So being familiar with the individual components is also important



Speaker:

depending on what the disaster is.



Speaker:

Yeah.



Speaker:

Agreed.



Speaker:

Um, and you know, there, there, there's an application that we haven't



Speaker:

talked about in terms of including or not including it in your Dr.



Speaker:

Scope and that is, what about SaaS applications like Microsoft 365?



Speaker:

Um, there are a couple of different scenarios there.



Speaker:

One is your.



Speaker:

Um, account is damaged in some sort of logical way, meaning logical



Speaker:

corruption, meaning a ransomware attack.



Speaker:

You deleted it.



Speaker:

Right?



Speaker:

We, we've covered that



Speaker:

on,



Speaker:

provider damaged it.



Speaker:

yeah.



Speaker:

The provider.



Speaker:

Well, that's, that, that was, I'm gonna list that as like a



Speaker:

third.



Speaker:

Well, if they damaged your account.



Speaker:

And there is an example of that.



Speaker:

Uh, for example, the sales for



Speaker:

Yep.



Speaker:

That's what I was thinking.



Speaker:

story where they went and blew up everybody's permissions and everybody



Speaker:

had to restore that themselves.



Speaker:

Um, that man, I hate that story.



Speaker:

I really do, because Salesforce, in my opinion, did not own up to.



Speaker:

Uh, they, they, didn't step up to the plate



Speaker:

Yeah.



Speaker:

at, at the time.



Speaker:

I remember writing a blog post for, uh, Druva.



Speaker:

I was working for Druva at the time, and I remember writing for Blo, a blog



Speaker:

post that said something like, proof that, that Salesforce should not be



Speaker:

trusted with your backup infrastructure.



Speaker:

Um, 'cause they clearly don't know what they're doing.



Speaker:

The but there's, but, so there's your account being damaged in some way.



Speaker:

And then there's OVH Cloud.



Speaker:

And what happened there Where the entire infrastructure goes?



Speaker:

Poof.



Speaker:

Right.



Speaker:

So, uh, I think you should have that as, as you, you should come up with that



Speaker:

as a scenario that you need to test.



Speaker:

It's going to be challenging in most cases because just let's just



Speaker:

talk about the different scenarios.



Speaker:

Let's say it's AWS the way most people back up AWS If AWS goes down



Speaker:

and takes your backups with it.



Speaker:

You're screwed.



Speaker:

You're screwed.



Speaker:

Um, the way that most people back up most cloud infrastructure.



Speaker:

And then there's the fact that most people trust their SaaS provider for data



Speaker:

protection, which they should not, right?



Speaker:

I talk about that all the time.



Speaker:

They should not, But, but but even if you're backing up your, your



Speaker:

data, um, to a third party and, and it's not, they should not.



Speaker:

Right?



Speaker:

I talk about that all the time.



Speaker:

They should not.



Speaker:

This has been



Speaker:

a day in eight.



Speaker:

Problem though,



Speaker:

This is an age old problem, but we're talking about testing today.



Speaker:

So by the way, this is definitely gonna be two episodes.



Speaker:

We haven't even gotten to the testing yet.



Speaker:

All we're talking about is setting up the requirements.



Speaker:

So,



Speaker:

so this is definitely gonna be a second



Speaker:

episode.



Speaker:

Um, so we talk about, we talk about setting the scope and we



Speaker:

talk about starting small first.



Speaker:

Each of these individual components in your infrastructure and um,



Speaker:

were you about to say something?



Speaker:

that this is not destructive and



Speaker:

not disruptive.



Speaker:

Yeah.



Speaker:

Nice, nice.



Speaker:

I like that.



Speaker:

Non-destructive, non-disruptive DR testing.



Speaker:

I



Speaker:

like it.



Speaker:

because you don't wanna take down your production.



Speaker:

You don't wanna affect, say, ongoing Dr.



Speaker:

Resiliency that's available on the secondary site because you're



Speaker:

about to do some of this testing.



Speaker:

There are cases where you do want to impact that, but when you're doing sort



Speaker:

of these individual component levels, you may not want to impact your overall DR.



Speaker:

Posture while you're doing this



Speaker:

Yeah, that that last one is probably the hardest and it



Speaker:

and may actually be impossible.



Speaker:

What are we talking about there?



Speaker:

We're saying that if you have a DR system, I hope you have a DR system



Speaker:

if you have one, you know, see if there's a way



Speaker:

to test your DR without messing up your dr.



Speaker:

Um, I'm not sure if that's possible



Speaker:

in, in many scenarios, but.



Speaker:

but one of the things I think about right is a lot of data protection



Speaker:

vendors, they do test and dev, right?



Speaker:

You can spin



Speaker:

up a copy off of your storage that is a writeable copy that you can then use



Speaker:

for testing out your recovery systems



Speaker:

without impacting the actual recovery instance.



Speaker:

right.



Speaker:

And so that's what I'm thinking about is just like those sort of scenarios or maybe



Speaker:

you're able to wheel in an extra server or beg, borrow steel, an extra server to use



Speaker:

for your recovery testing or DR testing,



Speaker:

Yeah.



Speaker:

You know, it's, it's funny that you say that because I, I, you know, when



Speaker:

you, I like this idea wheeling in a, a



Speaker:

server.



Speaker:

I mean, I, I, I think many of our listeners, you know, and, and I, and every



Speaker:

time I, by default, I'm always talking about data center, even though I, you



Speaker:

know, and then in my brain goes, Hey,



Speaker:

nobody has a data center anymore.



Speaker:

Um, I don't think many people are wheeling in a server.



Speaker:

I think that they're, they're, you know, they're looking at.



Speaker:

Cloud



Speaker:

infrastructure as a way.



Speaker:

I, I know that not everybody can do that.



Speaker:

And that's obviously another thing that you have to decide upfront is



Speaker:

how are we going to do this recovery?



Speaker:

Are we going to do it in the cloud?



Speaker:

Are we gonna do it?



Speaker:

The alternate infras, alternate infrastructure, um, these are all



Speaker:

things that you have to decide upfront.



Speaker:

We have to decide what it is we're gonna restore.



Speaker:

We have to decide, um, where we're going to restore and, and how.



Speaker:

Right and.



Speaker:

The deciding the where.



Speaker:

Again, this is all about planning.



Speaker:

This is all stuff that needs to be



Speaker:

decided upfront.



Speaker:

This is also something that's going to be part of your backup and Dr.



Speaker:

Design, you will have decided up upfront, you know how we're going to do Dr.



Speaker:

Uh, well, disaster recovery, how we're going to do it.



Speaker:

And that place is most likely the place that you're going



Speaker:

to, uh, be doing DR testing.



Speaker:

And, you know, there are a lot of choices here.



Speaker:

Cloud infrastructure, I think, is the best choice for most environments.



Speaker:

Another choice is aging infrastructure, right?



Speaker:

So



Speaker:

you move, you know, you move your older stuff out and that



Speaker:

becomes your DR environment.



Speaker:

Um, another choice, another very common choice is basically, um,



Speaker:

there, there's two ways to basically.



Speaker:

Rent infrastructure for the purposes of disaster recovery.



Speaker:

One way is to contract with, um, you know, a company that will provide



Speaker:

what you need if and when you need it.



Speaker:

Uh, and that costs, you know, let's say this much.



Speaker:

And then there's a company that will provide everything you need,



Speaker:

always available to you all the time.



Speaker:

Even when you don't need it.



Speaker:

And that will cost this much.



Speaker:

Yeah, exactly.



Speaker:

I can't do the fingers, I gotta do the hands.



Speaker:

Right.



Speaker:

It's significantly more.



Speaker:

Uh, and, and this is why I push everybody to the cloud as much as you can.



Speaker:

'cause that's the beautiful thing about the cloud, is that you can just literally



Speaker:

snap your fingers and, um, you know, and use, and you can also use infrastructure



Speaker:

as code so that you can just make all of the hardware that you need



Speaker:

magically appear.



Speaker:

What's that?



Speaker:

and work right.



Speaker:

And work.



Speaker:

Yeah.



Speaker:

Magically appear when you need it.



Speaker:

And then the moment you no longer need it because your test is over, you can



Speaker:

also snap your fingers and it all goes away and you just pay for it only when,



Speaker:

uh, it's, you know, up and running.



Speaker:

Um, so the, the next thing to talk about is, so we decided



Speaker:

what it's we're gonna test.



Speaker:

We decided where we're gonna test it, how we're gonna test it,



Speaker:

what about our success criteria?



Speaker:

Yeah.



Speaker:

I think this is important to note upfront what it means to be



Speaker:

successful, but I think it's also important to be realistic, right.



Speaker:

With your success criteria, especially if this is sort of your



Speaker:

first time doing this, because I remember the story you tell Curtis



Speaker:

about your, your runbooks that you



Speaker:

Yeah.



Speaker:

Yeah.



Speaker:

when you used to work at the bank.



Speaker:

And how a lot of times they were not able to get through and



Speaker:

actually do the recovery because they, a step would be skipped or



Speaker:

something wouldn't work properly.



Speaker:

And so I think it's sort of a learning process.



Speaker:

So don't be too hard on yourself.



Speaker:

If the first 10, 20, a hundred times you try doing your recovery testing, it's



Speaker:

not like a hundred percent successful.



Speaker:

Yeah.



Speaker:

may not be your fault.



Speaker:

Things change, environments change, hard



Speaker:

work may change, right?



Speaker:

There's so many other factors,



Speaker:

but.



Speaker:

this is, this is very closely related to the discussions we've had over, um,



Speaker:

doing, uh, um, cyber recovery testing.



Speaker:

Right?



Speaker:

And your disaster recovery very likely will be part of an



Speaker:

overall cyber recovery process.



Speaker:

And I agree with you that I.



Speaker:

Um, it's interesting.



Speaker:

I was go mentally, I was going somewhere completely different.



Speaker:

But you, you, once again, this is why we make such a good team.



Speaker:

Um, you were more like, Hey, make the re make the, um, um,



Speaker:

you know, the requirements.



Speaker:

Uh, be nice to yourself, especially if it's the first time out.



Speaker:

You know, requirement number one, no one dies, right?



Speaker:

Yep.



Speaker:

No fires are created.



Speaker:

No one quits.



Speaker:

Um.



Speaker:

And, um, the, you know, set your expectations low and nobody can



Speaker:

take you, take 'em away from you.



Speaker:

I'm borrowing heavily from, uh, Michael p Connolly, the comedian who,



Speaker:

who says that's the, the happiness to life is to lower your expectations.



Speaker:

And he's like, my goal for today is to go to the bathroom outside my pants.



Speaker:

Um,



Speaker:

so yeah.



Speaker:

Set the, uh, you know, the success criteria low in the beginning.



Speaker:

Um, where I was going was of course, REO and RPO, which we talked about before.



Speaker:

The, that, that is definitely going to determine and the overall success,



Speaker:

once you click the stopwatch and you begin the recovery process.



Speaker:

And then you click it, that everything is back up and running and, and you've



Speaker:

tested that, that, that whatever it is that you destroyed or you pretended you



Speaker:

destroyed, is now up and running and fully functional Again, as we've mentioned



Speaker:

in the RTO and RPO, that doesn't just mean the time of the restore, it's,



Speaker:

it's, you know, it's.



Speaker:

ah, here's a question.



Speaker:

Is it RTO and RPO or is it RTA and RPA?



Speaker:

the, it's, that is the objective,



Speaker:

right?



Speaker:

The RTO and RPO are, are the objective that we are shooting for.



Speaker:

And so the goal, um, in a recovery test of any kind is that the RTA, that's recovery



Speaker:

time, actual, uh, and recovery point actual, are less than, uh, or equal to



Speaker:

the RTO and RPO.



Speaker:

go back and listen to our episode.



Speaker:

We covered it, I think a couple episodes ago, right?



Speaker:

No, it was, well, well, I don't know.



Speaker:

It depends on when this one gets published.



Speaker:

We just, we just did it.



Speaker:

Yeah.



Speaker:

That's why I'm saying we just did it not.



Speaker:

Oh, was that really the last episode?



Speaker:

Well, it just published.



Speaker:

Um, uh, but again, I don't know, you know,



Speaker:

we'll see how these things, you know, but, um, yeah, if you're, if, if RTO and RPO



Speaker:

don't just roll off your tongue and you don't know what they are and, and, and,



Speaker:

you know, all of that stuff, it, they literally should be in every conversation



Speaker:

having anything to do with backup and Dr.



Speaker:

Design.



Speaker:

But that to me is the ultimate success criteria, right?



Speaker:

Another success criteria is the degree, and this, this is gonna be



Speaker:

a percentage, um, a per a percentage achieved, and that is the degree to which



Speaker:

you were able to follow your runbook



Speaker:

and just do what's in the runbook.



Speaker:

This is



Speaker:

what you, this is what you were alluding to before, right?



Speaker:

Well, and hopefully you have a runbook.



Speaker:

Yeah.



Speaker:

Like you said, that, that, that assumes you have a run book.



Speaker:

Yeah.



Speaker:

Um.



Speaker:

And, and the way we always did this at the bank, as we mentioned



Speaker:

before, is that we would have someone other than the person who runs the



Speaker:

backup system do the DR testing.



Speaker:

Right?



Speaker:

Here's the runbook, please follow it because we're gonna pretend



Speaker:

that Curtis got hit by a bus.



Speaker:

It was never anything nice.



Speaker:

It was never



Speaker:

that I won the lottery and then just flew the coop.



Speaker:

It was always Curtis got hit by a bus or got swallowed up in the, in the sink.



Speaker:

The great sinkhole of 2024.



Speaker:

Um,



Speaker:

long do you think it's gonna be until people don't even know what a bus is?



Speaker:

I think, I think we're good.



Speaker:

I think we're



Speaker:

good.



Speaker:

You know, I, um, it's funny, definitely an aside, when I was working the



Speaker:

election, uh, the last two weeks we're at a school and one of the hardest



Speaker:

time we're, we're, we're at a, like a multifunction building next to a school.



Speaker:

And one of the hardest times was during pickup and drop off because



Speaker:

there are all these parents that are picking up and dropping off their kids.



Speaker:

And I was like.



Speaker:

You know, if only there was like a large vehicle that we could put all the kids



Speaker:

in and like we could like paint it yellow so that people could see it and then like



Speaker:

have a sign that comes out and make sure, oh, traffic stops in both directions.



Speaker:

If only we could do all that.



Speaker:

And then all these parents wouldn't have to like, uh, take their kids



Speaker:

to school and waste all of that gas and all of those cars sitting there.



Speaker:

And I'm sure they're, the gas is running while they're sitting



Speaker:

there for a half hour waiting for their kid to come outta school.



Speaker:

Anyway, I digress.



Speaker:

I don't know.



Speaker:

Hmm.



Speaker:

But, um, so that, I, I think that, we'll, we'll stop there because



Speaker:

basically it, the number one thing, the number one success criteria that



Speaker:

I'm gonna say is that you know what your success criteria are before.



Speaker:

What are we gonna restore?



Speaker:

How are we gonna restore it?



Speaker:

Where are we going to restore it, and how long is it, you know, what, what



Speaker:

timeframe are we trying to fit in?



Speaker:

And also what other, what other, like the thing we talked about with the.



Speaker:

With the, uh, uh, the other criteria being that, that we can



Speaker:

do it without Curtis' help, right?



Speaker:

Or what, whatever, you know, whatever your, your Curtis is, right?



Speaker:

Um, decide on what all of those are upfront and you have a much



Speaker:

better chance of being successful when you actually do the recovery.



Speaker:

What do you think?



Speaker:

No, that makes sense.



Speaker:

Okay.



Speaker:

And with that, I will thank you once again for being a great co-host persona.



Speaker:

I try, I try.



Speaker:

This was a fun, I I like talking about disaster recovery testing.



Speaker:

Yeah, absolutely.



Speaker:

And we will, uh, hope you guys enjoyed this as well.



Speaker:

Uh, that is a wrap.



Speaker:

The backup wrap up is written, recorded, and produced by me w Curtis Preston.



Speaker:

If you need backup or Dr.



Speaker:

Consulting content generation or expert witness work,



Speaker:

check out backup central.com.



Speaker:

You can also find links from my O'Reilly Books on the same website.



Speaker:

Remember, this is an independent podcast and any opinions that



Speaker:

you hear are those of the speaker and not necessarily an employer.



Speaker:

Thanks for listening.