Check out our companion blog!
Nov. 6, 2023

CDP: The Next Great Thing in DR?

In this episode of the Backup Wrap-Up, W. Curtis Preston and Prasanna Malaiyandi discuss Continuous Data Protection (CDP) and its potential as the next great thing in disaster recovery. They explore the concept of meeting an RTO and RPO of zero and question why CDP isn't used for all backups in DR. Tune in to learn more about CDP and its role in backup and disaster recovery.

Article mentioned in the story:

https://www.theregister.com/2023/10/10/ransomware_attacks_register_record_speeds

Transcript

Speaker:

If you're responsible for backup and Dr.

 

Speaker:

At some point, someone is going to tell you about their amazing product based

 

Speaker:

on continuous data protection or CDP.

 

Speaker:

They say they can meet an RTO and RPO of zero, which sounds great.

 

Speaker:

Why don't we do all backups and Dr.

 

Speaker:

Using this method.

 

Speaker:

Hi, I'm W.

 

Speaker:

Curtis Preston, AKA Mister backup.

 

Speaker:

And I started this podcast to turn unappreciated, backup admins

 

Speaker:

into cyber recovery heroes.

 

Speaker:

This episode will answer all your questions about CDP, which

 

Speaker:

some say is the next great thing.

 

Speaker:

And Dr.

 

Speaker:

This is the backup wrap-up.

 

Speaker:

Hi and welcome to the show.

 

Speaker:

And once again, I have a guy

 

Speaker:

who cost me money.

 

Speaker:

Prasanna Malaiyandi how's it going, Prasanna?

 

Speaker:

I'm good.

 

Speaker:

I'm worried about

 

Speaker:

what I'm going to

 

Speaker:

be blamed for

 

Speaker:

now.

 

Speaker:

Well, I think, I think that, you know,

 

Speaker:

the, the fact that I

 

Speaker:

have new AirPods is your fault.

 

Speaker:

What do you

 

Speaker:

So,

 

Speaker:

uh, no.

 

Speaker:

the fact that

 

Speaker:

you lost your AirPods.

 

Speaker:

I think that you

 

Speaker:

manifested it.

 

Speaker:

You were suggesting that I needed new AirPods and I think my current AirPods got

 

Speaker:

upset

 

Speaker:

and then they literally flew out

 

Speaker:

of my pocket.

 

Speaker:

They're like, doo doo doo doo doo doo

 

Speaker:

Yeah, it was the weirdest thing.

 

Speaker:

Like, I, I had, like I, I, I've

 

Speaker:

done.

 

Speaker:

really good job

 

Speaker:

with holding on my ear pods.

 

Speaker:

And then I was,

 

Speaker:

I was

 

Speaker:

at a restaurant and, um, you know, having

 

Speaker:

a date with my lovely wife was great and

 

Speaker:

I pulled, pulled

 

Speaker:

the thing out of my

 

Speaker:

pocket and literally the case, like, flipped,

 

Speaker:

like, open and the AirPod just went flying.

 

Speaker:

And I don't, it was, it was such a weird thing that I didn't even realize

 

Speaker:

it happened when it happened.

 

Speaker:

It wasn't

 

Speaker:

until I got home and I realized that both my AirPods were no longer in it

 

Speaker:

And, uh, yeah, so I just like, in a moment like that I lost my AirPods.

 

Speaker:

So I think what you need is a case that has one of those clasps on it.

 

Speaker:

Right?

 

Speaker:

So you have to undo the clasps in order for it to open.

 

Speaker:

oh,

 

Speaker:

Right?

 

Speaker:

Because just the normal

 

Speaker:

Silicon ones, I don't think will be sufficient for you.

 

Speaker:

Yeah,

 

Speaker:

apparently

 

Speaker:

not.

 

Speaker:

But, hey, I've got the new

 

Speaker:

the new fancy AirPods Pro Generation 2 USB C,

 

Speaker:

which really should be called Generation 3.

 

Speaker:

But, You

 

Speaker:

know, because I had to

 

Speaker:

very

 

Speaker:

specifically make sure that I bought the one with the USB C.

 

Speaker:

well, it's

 

Speaker:

because the actual thing is the same.

 

Speaker:

Yeah, well, yeah, but you know what I'm saying.

 

Speaker:

I mean,

 

Speaker:

I know

 

Speaker:

like I was going to buy it at Costco, but Costco only has

 

Speaker:

the, uh, the older

 

Speaker:

Shantoos.

 

Speaker:

Yep.

 

Speaker:

Yeah,

 

Speaker:

but,

 

Speaker:

uh,

 

Speaker:

Tough life you live, Curtis.

 

Speaker:

Uh, I'm just waiting to listen to what I'll be blamed for next.

 

Speaker:

Absolutely.

 

Speaker:

So,

 

Speaker:

uh, we're speaking of blaming.

 

Speaker:

We got blame to go around.

 

Speaker:

I,

 

Speaker:

I, I, think we should take credit for this, for this piece of

 

Speaker:

news.

 

Speaker:

What do you think?

 

Speaker:

Oh,

 

Speaker:

Curtis.

 

Speaker:

Yes, it's our ability to

 

Speaker:

expose to the listeners, hey, here's what ransomware is, that...

 

Speaker:

Yeah, I think you're right.

 

Speaker:

We...

 

Speaker:

You think, you think,

 

Speaker:

But is it a good or a bad thing, though?

 

Speaker:

That's my question, this article.

 

Speaker:

well, I actually think

 

Speaker:

it's a good thing.

 

Speaker:

Let's, so let's talk about

 

Speaker:

it.

 

Speaker:

So the, the headline, and it's from a story in the register.

 

Speaker:

ransomware

 

Speaker:

attacks register,

 

Speaker:

It's a bit, it's funny I realized that the word register was in the

 

Speaker:

title and it messed me up there.

 

Speaker:

Ransomware

 

Speaker:

attacks register record speeds thanks to successive InfoSec

 

Speaker:

industry.

 

Speaker:

So when I

 

Speaker:

first heard that, I

 

Speaker:

was like Wait,

 

Speaker:

I, you know, that one, that one literally, uh,

 

Speaker:

threw me.

 

Speaker:

So the subtitle here is dwell times

 

Speaker:

drop to hours rather than days for the first time.

 

Speaker:

So first off,

 

Speaker:

do you want to explain what a dwell time is for those of our listeners

 

Speaker:

that don't know?

 

Speaker:

Yeah.

 

Speaker:

yeah, so in the past with ransomware, I think

 

Speaker:

before

 

Speaker:

it used to be

 

Speaker:

measured, like you said,

 

Speaker:

in days, like four and a half to five and a half days in the

 

Speaker:

last couple of years.

 

Speaker:

Right.

 

Speaker:

But this is basically the amount of time that

 

Speaker:

ransomware is in your system.

 

Speaker:

So someone

 

Speaker:

has attacked, infiltrated your systems, they've dropped a package.

 

Speaker:

It hasn't done anything though,

 

Speaker:

Right.

 

Speaker:

It's just sitting

 

Speaker:

there and waiting.

 

Speaker:

Right,

 

Speaker:

and there

 

Speaker:

is now while it's waiting it could be

 

Speaker:

discovering other things, figure out what's important, what's

 

Speaker:

not but while

 

Speaker:

it's waiting there's always a

 

Speaker:

risk that

 

Speaker:

it could be detected, it could be destroyed, and so previously, like you

 

Speaker:

were saying, four and a half five and a

 

Speaker:

half days for the dwell time, that it would just sit in

 

Speaker:

your environment, not doing

 

Speaker:

anything.

 

Speaker:

Yeah,

 

Speaker:

I remember

 

Speaker:

when, I don't know if there's a difference.

 

Speaker:

Well, there's definitely a difference between the average and the mean, but

 

Speaker:

I remember when the mean dwell time was measured in many days, right?

 

Speaker:

Like, like

 

Speaker:

it was like as high as

 

Speaker:

45.

 

Speaker:

Right.

 

Speaker:

And now they're saying that the, uh,

 

Speaker:

um, you

 

Speaker:

know

 

Speaker:

this time, and I don't know if they're using mean or average,

 

Speaker:

but, uh, it says it's down to

 

Speaker:

24 hours.

 

Speaker:

And they're saying, and in more than 10 percent of the

 

Speaker:

incidents,

 

Speaker:

It was deployed within five hours The ransomware

 

Speaker:

was, you know, the actual ransomware part was done within

 

Speaker:

five hours of the initial attack,

 

Speaker:

which

 

Speaker:

is good, right?

 

Speaker:

Because, like the title said, that means people are detecting it faster, right?

 

Speaker:

And ransomware crews and ransomware as a

 

Speaker:

service affiliates, right?

 

Speaker:

They realize, yeah, we can't just let it sit there.

 

Speaker:

We have

 

Speaker:

to Be in and out

 

Speaker:

as quickly as possible.

 

Speaker:

Right, yeah, and and that's, that's why the headline,

 

Speaker:

they're saying, well, because

 

Speaker:

we've gotten

 

Speaker:

better at

 

Speaker:

detecting it,

 

Speaker:

they've, they've basically

 

Speaker:

had to realize they've had to,

 

Speaker:

You know,

 

Speaker:

once they're in and

 

Speaker:

they got to

 

Speaker:

do bad stuff right away.

 

Speaker:

Otherwise, they're going to, they're going to get detected.

 

Speaker:

Um, go ahead.

 

Speaker:

Another interesting fact I

 

Speaker:

saw in the article was, I know we always

 

Speaker:

talk about like double extortion,

 

Speaker:

right?

 

Speaker:

Where

 

Speaker:

someone comes in, they encrypt your.

 

Speaker:

Environment, but they also exfiltrate data, right?

 

Speaker:

So now

 

Speaker:

you have to pay,

 

Speaker:

right?

 

Speaker:

Because otherwise,

 

Speaker:

who wants to have their data released?

 

Speaker:

I think, actually, as we're

 

Speaker:

recording this,

 

Speaker:

there is a company that

 

Speaker:

is potentially going to have

 

Speaker:

their data exposed

 

Speaker:

because they

 

Speaker:

decided not to pay the ransomware operators,

 

Speaker:

right.

 

Speaker:

right?

 

Speaker:

And that's the

 

Speaker:

double extortion.

 

Speaker:

Now, in the article though, they said that the times, number of times that

 

Speaker:

they're seeing double

 

Speaker:

extortion from the people they've surveyed

 

Speaker:

is only 13 percent of the

 

Speaker:

time.

 

Speaker:

That seems really

 

Speaker:

low,

 

Speaker:

but given

 

Speaker:

you only have 24 hours, maybe

 

Speaker:

it makes sense.

 

Speaker:

They don't have

 

Speaker:

enough time to do more damage.

 

Speaker:

Right.

 

Speaker:

Yeah, that, that,

 

Speaker:

and I see that

 

Speaker:

as good news, as I'm

 

Speaker:

sure

 

Speaker:

you understand, because the, the actual,

 

Speaker:

the thing I'm worried

 

Speaker:

most about is the Um, exfiltration because

 

Speaker:

backup just can't help in

 

Speaker:

that, right?

 

Speaker:

Uh, once the data has been

 

Speaker:

exfiltrated,

 

Speaker:

all all bets are

 

Speaker:

off.

 

Speaker:

So I saw

 

Speaker:

that as good.

 

Speaker:

And that part came from the annual threat intelligence report from Microsoft.

 

Speaker:

So

 

Speaker:

that

 

Speaker:

that is really interesting though, is that, um, uh,

 

Speaker:

the other reason why I think this is a good thing

 

Speaker:

is that the shorter the dwell

 

Speaker:

time,

 

Speaker:

the easier the

 

Speaker:

recovery.

 

Speaker:

So when you have a dwell

 

Speaker:

time

 

Speaker:

measured in days or

 

Speaker:

weeks,

 

Speaker:

And you're doing something along the way,

 

Speaker:

especially if you're encrypting data

 

Speaker:

along the way,

 

Speaker:

how do you recover from

 

Speaker:

that?

 

Speaker:

Right?

 

Speaker:

There's no,

 

Speaker:

the, the the good point

 

Speaker:

in time

 

Speaker:

is three weeks

 

Speaker:

ago,

 

Speaker:

right?

 

Speaker:

Do you do you really want to recover?

 

Speaker:

your primary file

 

Speaker:

server, for example?

 

Speaker:

This was the one I was always

 

Speaker:

worried about.

 

Speaker:

If you

 

Speaker:

encrypt VMs, if you encrypt databases,

 

Speaker:

it's easy to notice

 

Speaker:

the moment you encrypt anything, everything stops working

 

Speaker:

and you know when the

 

Speaker:

point in time

 

Speaker:

is.

 

Speaker:

But if you talk about a file

 

Speaker:

server

 

Speaker:

or someone's workstation that has a lot of files

 

Speaker:

on it, if you're able

 

Speaker:

to encrypt...

 

Speaker:

data over

 

Speaker:

time

 

Speaker:

and not be noticed, Restoring that is

 

Speaker:

significantly more

 

Speaker:

complicated than restoring

 

Speaker:

an encryption attack that takes place over hours.

 

Speaker:

So I think this

 

Speaker:

is a much better uh, scenario.

 

Speaker:

It does mean we have to continue to

 

Speaker:

stay vigilant

 

Speaker:

and to make sure that we're continuing to detect

 

Speaker:

so that they continue to have dwell times this small.

 

Speaker:

And this

 

Speaker:

also goes to the importance of backups,

 

Speaker:

right?

 

Speaker:

Cause if it does hit, like you were saying, you want to

 

Speaker:

be able to restore.

 

Speaker:

And so if you don't have a

 

Speaker:

backup that you can restore from.

 

Speaker:

Then you're going to lose data.

 

Speaker:

Right.

 

Speaker:

There, There was another thing here that they were

 

Speaker:

saying that, you know, because of ransomware as a service uh, businesses,

 

Speaker:

that they

 

Speaker:

actually,

 

Speaker:

it says in June,

 

Speaker:

they broke the single

 

Speaker:

month record for ransomware attacks.

 

Speaker:

Thanks to a single exploit, uh, the MoveIt

 

Speaker:

MFT exploit, which I actually don't know much about, but that

 

Speaker:

single exploit allowed them to uh,

 

Speaker:

break the record of the number of attacks in a month.

 

Speaker:

That

 

Speaker:

doesn't sound good.

 

Speaker:

None of this sounds good, I guess.

 

Speaker:

It's just,

 

Speaker:

I do like

 

Speaker:

a quicker attack because

 

Speaker:

a quicker attack is, I think.

 

Speaker:

Easier

 

Speaker:

to

 

Speaker:

defend against,

 

Speaker:

or let me rephrase

 

Speaker:

that,

 

Speaker:

a quicker attack is easier

 

Speaker:

to recover from.

 

Speaker:

Yeah.

 

Speaker:

And also a

 

Speaker:

hundred percent

 

Speaker:

agree with you, Curtis.

 

Speaker:

So,

 

Speaker:

what,

 

Speaker:

so do you still want to claim credit because of our podcast that

 

Speaker:

we're helping

 

Speaker:

improve?

 

Speaker:

to how

 

Speaker:

much

 

Speaker:

we have gotten the word out there that long

 

Speaker:

dwell times are bad,

 

Speaker:

that the attackers have

 

Speaker:

made short dwell times.

 

Speaker:

You

 

Speaker:

So any attackers

 

Speaker:

so any attackers

 

Speaker:

out there, if you would like to come on the podcast and talk about this,

 

Speaker:

please reach out and

 

Speaker:

let us know.

 

Speaker:

Can you

 

Speaker:

imagine?

 

Speaker:

Can you imagine that?

 

Speaker:

Um, once

 

Speaker:

again, another thing

 

Speaker:

from here, once

 

Speaker:

again,

 

Speaker:

the two highest

 

Speaker:

profile attacks of 2023 were the result

 

Speaker:

of unpatched infrastructure, right?

 

Speaker:

Um,

 

Speaker:

we like to talk about on the podcast, right?

 

Speaker:

yeah, yeah,

 

Speaker:

MFA, patcher systems,

 

Speaker:

do

 

Speaker:

backups.

 

Speaker:

Exactly.

 

Speaker:

That would stop the vast majority of

 

Speaker:

ransomware attacks that we see.

 

Speaker:

Well, with that, that is the news of the day.

 

Speaker:

This week's episode is a continuation of our Backup to Basics series,

 

Speaker:

and this week, we're going to be talking about a Product category

 

Speaker:

that at one point was red hot.

 

Speaker:

Was it not?

 

Speaker:

Do you remember when this product category was red hot?

 

Speaker:

Like everybody had to have a CDP product.

 

Speaker:

Do you

 

Speaker:

I want to say it was like 2002, 2003.

 

Speaker:

Yeah.

 

Speaker:

What I remember was being at Storage Networking World and half of the

 

Speaker:

booths were CDP products, remember

 

Speaker:

is CDP Curtis or our

 

Speaker:

yeah, we're, we're gonna, we're gonna talk about that in just

 

Speaker:

a second, but just the, the.

 

Speaker:

The sheer number, I remember thinking all of these can't succeed and little

 

Speaker:

did I know that pretty much almost none of them, uh, would succeed.

 

Speaker:

Uh,

 

Speaker:

I want to say there's like four left.

 

Speaker:

In the world.

 

Speaker:

Yeah, there's, well, and, and most of them got acquired and

 

Speaker:

are, are simply a checkbox on, on another product's portfolio.

 

Speaker:

So what is CDP?

 

Speaker:

It stands for continuous data.

 

Speaker:

Protection.

 

Speaker:

And this was a, you may recall in a previous episode, we talked about

 

Speaker:

replication and what, as far as I'm concerned, what is the primary problem

 

Speaker:

with date with replication as a community.

 

Speaker:

Data protection or a basically a replacement for backup.

 

Speaker:

What's the primary problem with it?

 

Speaker:

Whatever you do here happens here.

 

Speaker:

Exactly.

 

Speaker:

It is very efficient in replicating stupidity, right?

 

Speaker:

Uh, or, or, or ransomware attacks or anything in any sort of cyber attack.

 

Speaker:

So replication is great at giving you a, An RPO of zero, right?

 

Speaker:

A recovery point objective of zero, but it's also going to replicate

 

Speaker:

things that happen on a logical level.

 

Speaker:

Um, and so CDP was born and I describe CDP as replication with a back button.

 

Speaker:

What do you think of that?

 

Speaker:

That definition.

 

Speaker:

I like it, but I used to think I used to, you know what I used to call CDP?

 

Speaker:

What?

 

Speaker:

I was like, it's TiVo for your data.

 

Speaker:

Yeah.

 

Speaker:

That, but that was, uh, I remember, I remember vendors describing it like that.

 

Speaker:

Uh, the problem is now nobody knows what TiVo is.

 

Speaker:

I know that's why I said for the five listeners who may know what TiVo is.

 

Speaker:

And for the two of us, since we both had TiVos, right.

 

Speaker:

We understand that name.

 

Speaker:

And also if you do watch Psych, there is references.

 

Speaker:

Are there TiVo references in psych?

 

Speaker:

Oh, yeah,

 

Speaker:

All right.

 

Speaker:

Well, you would know better 'cause you've been, you've been binging psych lately, so

 

Speaker:

but, but, but yes, I agree with your point.

 

Speaker:

It is a back button for replication.

 

Speaker:

And specifically what you mean is replication.

 

Speaker:

Do you have that one copy with CDP?

 

Speaker:

You can go backwards from that one copy.

 

Speaker:

To other points in

 

Speaker:

yeah.

 

Speaker:

The, the reason why I call it replication with a back button is that, is that

 

Speaker:

the process of getting the data.

 

Speaker:

We've discussed that.

 

Speaker:

I see all of these things as backup.

 

Speaker:

A lot of people see backup as, well, putting something on tape or a backup

 

Speaker:

that changes its format, right?

 

Speaker:

A lot of people try to define sort of old school backup as something that requires

 

Speaker:

a restore, you know, different ways to try to define what old school backup is.

 

Speaker:

And...

 

Speaker:

I just see that as a, that is the old way we did backup.

 

Speaker:

This is now a new way that

 

Speaker:

we do backup.

 

Speaker:

Backup is just a method of putting the data in a different place

 

Speaker:

so that we can restore it in, in time of something bad happening.

 

Speaker:

And this is one of the newer ways.

 

Speaker:

And the, the thing is, unlike traditional backup, CDP is not a batch process.

 

Speaker:

Traditionally backup ran once a night.

 

Speaker:

Sometimes you might run it multiple times a day.

 

Speaker:

You could run it once an hour.

 

Speaker:

You could run it every five minutes.

 

Speaker:

Traditionally backup is a batch process.

 

Speaker:

CDP by definition, that C is that it is happening continuously.

 

Speaker:

All the time, just like replication.

 

Speaker:

Although we had some, there were some finer points there where we,

 

Speaker:

where you and I were trying to argue about on what continuous means, and

 

Speaker:

the idea is that it is happening truly continuously every time.

 

Speaker:

A block of data that is changed on the primary system.

 

Speaker:

It gets replicated to the target system Now, immediately, you know,

 

Speaker:

Yeah, we can debate that.

 

Speaker:

That's

 

Speaker:

this happens, but, but basically this is, it's not a batch process.

 

Speaker:

It's happening continuously throughout the day.

 

Speaker:

And then we can talk about how that is stored on the other end.

 

Speaker:

Uh, how are you okay with that part of the definition?

 

Speaker:

I'm good with that.

 

Speaker:

And I think the one other thing we should touch on is.

 

Speaker:

As technologies have evolved, so has CDP in the sense of we could

 

Speaker:

talk about where in the stack you're actually triggering or forwarding I.

 

Speaker:

O.

 

Speaker:

and the data from.

 

Speaker:

Typically, right, and way back in the day, right, all these CDP vendors when

 

Speaker:

you were probably at the SNIA, right, it was all, okay, here's an appliance.

 

Speaker:

that you put in, right, the writes might come into it, get split off, go to two

 

Speaker:

different places, right, that's one method that some people would do to make sure you

 

Speaker:

have two copies, continuously replicating.

 

Speaker:

Another method that some vendors have used is you sort of write to your

 

Speaker:

primary, the primary forwards it off to an appliance or to something else which

 

Speaker:

then writes it on the target system.

 

Speaker:

Right.

 

Speaker:

That's another mechanism people did.

 

Speaker:

All of that is sort of infrastructure level down at the

 

Speaker:

storage array or networking level.

 

Speaker:

Actually, some people even did it at like the storage area network level, right.

 

Speaker:

Where they would have that appliance in the middle, right.

 

Speaker:

And basically that's that first use case where you would write

 

Speaker:

to two different storage arrays.

 

Speaker:

The other thing moving up the stack, right, is with virtualization, people

 

Speaker:

were like, hey, the same challenges you had with sort of storage level, CDP,

 

Speaker:

let's do that at the VM level as well.

 

Speaker:

And so you had technologies that would allow you to split right at a VM level.

 

Speaker:

You could forward it off to another ESXI cluster in a different location and have a

 

Speaker:

continuously replicated VM somewhere else.

 

Speaker:

Right, basically they all, the concept was the same.

 

Speaker:

The question is, at what point are we going to split the right?

 

Speaker:

And then take one copy and send it where we would always send it to the

 

Speaker:

primary storage and the other copy of that right gets sent to some magic

 

Speaker:

process or box or whatever that will then store it for CDP purposes and.

 

Speaker:

Sometimes it can happen in the storage array.

 

Speaker:

There, there have been boxes that you can buy that go between your

 

Speaker:

storage array and your server.

 

Speaker:

Sometimes it might be an independent, you know, that box might be

 

Speaker:

an actual appliance, it might be a piece of software, right?

 

Speaker:

We had Datacore on here.

 

Speaker:

Datacore was one of those vendors that you can put the box in, you know, their

 

Speaker:

software on a box in between your.

 

Speaker:

Uh, storage array on your server, and it might be in, like you said, it might

 

Speaker:

be in the hypervisor, it might even be in the cloud, it might be something

 

Speaker:

that's being done in the cloud.

 

Speaker:

But the idea is that basically as the, literally as the data is being written,

 

Speaker:

it gets piped off into two places, and then the second of which is the CDP copy.

 

Speaker:

Do you consider, since we're talking about CDP, do you consider database

 

Speaker:

level things like Oracle's Data Guard as CDP or Exchange used to have

 

Speaker:

something like, what was it called?

 

Speaker:

CRR and all the rest where a write comes in and they forward over the

 

Speaker:

log, because that technically is CDP,

 

Speaker:

That is, that is application level replication.

 

Speaker:

It is not application level CDP because I don't think that with an active database

 

Speaker:

that you can just go backwards in time.

 

Speaker:

I know that if it crashes you can do, you can do media recovery against it.

 

Speaker:

But I don't think it's built.

 

Speaker:

So I'll just say if that's built into it, then sure.

 

Speaker:

Right.

 

Speaker:

But if it's just replicating the changes and doesn't have the ability

 

Speaker:

to go back in time, then no, right.

 

Speaker:

It's not CDP.

 

Speaker:

That is a very crucial aspect of CDP,

 

Speaker:

yeah, and one way to think about this is, I know with databases, we

 

Speaker:

think about redo logs, right, which allow you to go forward in time.

 

Speaker:

With CDP, you actually want undo logs, right?

 

Speaker:

How do I go backwards in time from the most recent version on the target system?

 

Speaker:

That's a really good point.

 

Speaker:

And I don't think anybody calls them undo logs.

 

Speaker:

So everybody calls them either redo logs or transaction logs.

 

Speaker:

No, I mean, in, in the

 

Speaker:

Oh, database.

 

Speaker:

um, they call them redo logs or they call them transaction logs, because

 

Speaker:

the idea is that you, you have a.

 

Speaker:

It allows you to have a backup, a traditional backup from this

 

Speaker:

point in time and then use those logs to redo the transactions

 

Speaker:

that happened during that point in time and since that point in time.

 

Speaker:

But with CDP, you are correct, the most important thing is to be able to go

 

Speaker:

back in time, which is not something that a typical database replication

 

Speaker:

scenario is going to be able to do.

 

Speaker:

You mentioned the ability to go back in time.

 

Speaker:

How far back in time should we be able to go with ACDP system?

 

Speaker:

Depends on what your requirements are, right?

 

Speaker:

I would say with the CDP system, it depends on what other environments

 

Speaker:

or infrastructure you have.

 

Speaker:

For instance, if you have backups, right?

 

Speaker:

That you're taking periodically, separately, outside of the CDP system.

 

Speaker:

Your CDP system may only need 7 days worth of data, so you can recover

 

Speaker:

within those 7 days at, sort of, uh, I.

 

Speaker:

O.

 

Speaker:

granularity.

 

Speaker:

Right.

 

Speaker:

Or a record granular or whatever we want to call it.

 

Speaker:

Right.

 

Speaker:

Uh, but as long as you have that backup system, that's fine.

 

Speaker:

Going back, say 30, 90, or trying to replace your backup system with the

 

Speaker:

CDP system is a little crazy because I think we need to talk about what's

 

Speaker:

required on the target system or on the target side in order to handle CDP.

 

Speaker:

Right, because in order to be able to go back in time, I need much more

 

Speaker:

storage at the target side than I need at the primary side, because

 

Speaker:

if I'm doing a hundred terabytes of storage, And I'm, and I'm

 

Speaker:

going to do CDP for that.

 

Speaker:

How much do you, because realize at that target side, I need to store the

 

Speaker:

hundred terabytes and every block.

 

Speaker:

That changes in that 100 terabytes during that

 

Speaker:

Date.

 

Speaker:

continuum that you've set,

 

Speaker:

Yeah.

 

Speaker:

And so that's why you would see sort of, and I think on the target side,

 

Speaker:

we should probably differentiate depending on what technology, right?

 

Speaker:

Your target system itself may not need all the extra space, but maybe that

 

Speaker:

target appliance, which is dealing with these transactions coming in or

 

Speaker:

these change blocks coming in, that might need to hold the space, right?

 

Speaker:

Uh, sort of as a log.

 

Speaker:

And this is really, this problem right here is why CDP, I think,

 

Speaker:

failed in terms of the dream of CDP.

 

Speaker:

The dream of CDP, because I remember meeting with CDP.

 

Speaker:

CEOs, and they were like, this solves everything, We can

 

Speaker:

recover to any point in time.

 

Speaker:

Why would you do it any other way?

 

Speaker:

And the answer is cost.

 

Speaker:

It's the cost because, because the thing you have to think about is

 

Speaker:

you have to store the data, right?

 

Speaker:

Not only with the metadata about what came in, the data that's there,

 

Speaker:

but if these are undue, right, you also need to store what the previous

 

Speaker:

data was as well, because you have to be able to go backwards in time.

 

Speaker:

And so you have to store all of this information in that appliance and.

 

Speaker:

Some people say that you might have like a 2 percent change rate per day.

 

Speaker:

That doesn't mean that that's 2 percent that's 2 percent over the entire day.

 

Speaker:

But if you're adding up every single transaction, right, that might turn

 

Speaker:

out to be like 5 percent actual change.

 

Speaker:

Right.

 

Speaker:

Or 10%, right?

 

Speaker:

if you have anything, if you have a block updated, if we're talking about

 

Speaker:

block level CDP here, which is generally what we're talking about, if a block

 

Speaker:

changes multiple times during the day, you have to store every version

 

Speaker:

of that block throughout the day.

 

Speaker:

And, uh, you're right.

 

Speaker:

It could be a significant percent.

 

Speaker:

And by the way, you have no idea what that number is until you deploy CDP.

 

Speaker:

Right.

 

Speaker:

The other,

 

Speaker:

and

 

Speaker:

the other thing I know you were just mentioning about sort of, you don't know

 

Speaker:

what you'll need until you deploy it.

 

Speaker:

You also have to deploy it on pretty fast and expensive hardware, because if

 

Speaker:

you think about it, you're getting this constant stream of rights that you have

 

Speaker:

to store and you have to replay it down to your target storage location as well.

 

Speaker:

And so your destination system might need to be beefier or the infrastructure

 

Speaker:

required might need to be beefier than what you even have on your production.

 

Speaker:

Right?

 

Speaker:

So going back to that cost aspect, that starts to add up pretty fast.

 

Speaker:

yeah, this, we can go back to the episode on replication.

 

Speaker:

The synchronous and asynchronous aspect is important to understand here.

 

Speaker:

So generally CDP will be done asynchronously.

 

Speaker:

Do you remember synchronous CDP?

 

Speaker:

I think there was one vendor who did it, but yes,

 

Speaker:

okay.

 

Speaker:

So you could do, but I do, I think most people do it asynchronously.

 

Speaker:

And the point is, asynchronously is fine.

 

Speaker:

Obviously your RPO won't be zero.

 

Speaker:

It'll be something close to zero.

 

Speaker:

But the problem with asynchronous is if the target system gets behind

 

Speaker:

in those rights at some point, you know, the buffer is getting back.

 

Speaker:

At

 

Speaker:

your back pressure is going to have to, yeah,

 

Speaker:

Yeah.

 

Speaker:

That's a good term.

 

Speaker:

The back pressure.

 

Speaker:

I like that.

 

Speaker:

Right.

 

Speaker:

You, you will eventually have more rights in the buffer than the size of

 

Speaker:

the buffer, which would then essentially it then becomes a synchronous or

 

Speaker:

you have to start dropping rights.

 

Speaker:

Because

 

Speaker:

which you don't want to do.

 

Speaker:

be slowing down the primary system.

 

Speaker:

So you'd end up having to dump the buffer and you'd end up

 

Speaker:

losing bits along the way.

 

Speaker:

And that's just, that's just not something that you would want to do.

 

Speaker:

Now, one of the benefits I would say, though, with the CDP like

 

Speaker:

approach is you can do this sort of CDP to Dissimilar systems, right?

 

Speaker:

So you might be going from like a NetApp to an EMC, or you could be

 

Speaker:

going from a pure to a Hitachi.

 

Speaker:

So it gives you flexibility because the CDP applying software package,

 

Speaker:

whatever else, just needs access to devices on both sides, right?

 

Speaker:

It's doing all the replication, it's managing everything.

 

Speaker:

So for cases where you're looking to deal with uh, different costs or

 

Speaker:

availability of equipment, right?

 

Speaker:

It is an option rather than sort of being locked into a particular vendor.

 

Speaker:

Right.

 

Speaker:

Most of the CDP vendors that I know.

 

Speaker:

Uh, are, are independent of the storage, right?

 

Speaker:

So you can use whatever storage you want on, on both sides.

 

Speaker:

The thing is, I mean, we've, we've been, we've been harping on

 

Speaker:

it for a little bit, but I mean, the, the idea of CDP is amazing.

 

Speaker:

The idea that I can just go back to any point in time is amazing.

 

Speaker:

And I don't have to do anything special on the front end.

 

Speaker:

Um, but it does come with these downsides.

 

Speaker:

And so there were some things that happened over.

 

Speaker:

As CDP was deployed in more and more environments, customers, I

 

Speaker:

think, demanded certain features.

 

Speaker:

One of them was this term called right coalescing.

 

Speaker:

Do you want to talk about that a little bit?

 

Speaker:

Yeah.

 

Speaker:

So write coalescing is, I know Curtis, you talked about before where you had

 

Speaker:

multiple changes to a single block.

 

Speaker:

That would happen.

 

Speaker:

Uh, and that's great.

 

Speaker:

But at the end, would I need to replay something?

 

Speaker:

I don't need to know all the versions, right?

 

Speaker:

I could just say, look, just give me this version of the data.

 

Speaker:

That's all I care about.

 

Speaker:

And so being able to reduce down some of that data.

 

Speaker:

So maybe instead of having every transaction for the last

 

Speaker:

seven days, maybe for the last.

 

Speaker:

36 hours, I have every transaction, and then after that I'm going

 

Speaker:

to coalesce writes down.

 

Speaker:

So I have singular points in time rather than having every single

 

Speaker:

point in time available to me.

 

Speaker:

Because honestly, if I go back seven days, do I really care about this I.

 

Speaker:

O.

 

Speaker:

versus this I.

 

Speaker:

O.?

 

Speaker:

Right?

 

Speaker:

Like, how do I even find that point in time, you know?

 

Speaker:

That's the biggest challenge as well.

 

Speaker:

you'll be happy to have anything.

 

Speaker:

So you could start with true CDP.

 

Speaker:

You could, you could always replicate every change.

 

Speaker:

The system holds on to a certain amount of, you know, all of the

 

Speaker:

changes for a certain amount of time.

 

Speaker:

Configurable by the customer.

 

Speaker:

And then it starts coalescing and saying, okay, we're just going to

 

Speaker:

make sure we have all the blocks we need to represent this point in time.

 

Speaker:

And you might go with hourly snapshots after they're not snapshots, but

 

Speaker:

they're not snapshots in terms of what we traditionally think of as

 

Speaker:

There are point in times, yeah.

 

Speaker:

There are points in time.

 

Speaker:

So you have hourly points in time that you can recover.

 

Speaker:

And then maybe you go to daily and even weekly.

 

Speaker:

And that's where some CDP systems, that's what, that's the way some

 

Speaker:

CDP systems were trying to push out that amount of time that they could.

 

Speaker:

Essentially replaced the backup system.

 

Speaker:

But even then it's just not, doesn't really think the way of a regular

 

Speaker:

backup system would, and so it still ends up storing a lot more data.

 

Speaker:

And just being more costly in general.

 

Speaker:

I know, I remember another challenge with CDP systems is with backup.

 

Speaker:

I know we talk a lot about application consistency, right?

 

Speaker:

Making sure I have a application consistent point in time that Oracle,

 

Speaker:

for instance, can quickly recover and I don't need to worry about media

 

Speaker:

recovery and all the other processes.

 

Speaker:

With CDP systems, A lot of them missed out.

 

Speaker:

Now, they've gotten better, but back then, none of them really supported

 

Speaker:

application integration in a proper way.

 

Speaker:

Yes, some would do VSS integration to allow you to do like poor bands

 

Speaker:

backup, but for the most part, they were CDP systems operated

 

Speaker:

at an infrastructure level.

 

Speaker:

And so, it didn't have that capability that, honestly, like, as a backup

 

Speaker:

person, you cared about the application more than the storage, right?

 

Speaker:

You needed to make sure I had an application consistent backup that I

 

Speaker:

knew was good that I could recover from.

 

Speaker:

Yeah, exactly.

 

Speaker:

One of the challenges is that you say, well, you give me

 

Speaker:

infinite recovery points, right?

 

Speaker:

I just want one good one, one point when I know that the the CD,

 

Speaker:

a lot of the CDP products started integrating more with the database.

 

Speaker:

So that while they could still give you the infinite point, they could

 

Speaker:

say, Hey, we also put the database in backup mode at these points in time

 

Speaker:

so that we know that that point in time is one that is truly consistent

 

Speaker:

that you could, uh, recover from.

 

Speaker:

You, you could also use the other points in time, but we're giving you this one

 

Speaker:

that we know for sure that it's good.

 

Speaker:

It's special.

 

Speaker:

so, yeah, it's, it's special, right?

 

Speaker:

It's still not a snapshot, but it's a point in time when we can say

 

Speaker:

that, uh, when we can say that we know we can recover to that point.

 

Speaker:

One other thing about backup, right?

 

Speaker:

I know we always talk about test your backups, test your

 

Speaker:

backups, test your backups.

 

Speaker:

CDP becomes difficult to test in most environments.

 

Speaker:

Unless you have a lot of additional space and storage, because you don't

 

Speaker:

necessarily want to stop the copy being updated on the target site.

 

Speaker:

So now the question becomes, how do I now spin up a separate copy with

 

Speaker:

that particular point in time that I'm interested in so I can test and verify,

 

Speaker:

is my Oracle database backup, right?

 

Speaker:

Is that a good point in time or not?

 

Speaker:

Exactly.

 

Speaker:

Yeah.

 

Speaker:

So it was like, it gave you, it, it gave you almost too much.

 

Speaker:

Right?

 

Speaker:

the thing that it gave you that nothing else could give you except

 

Speaker:

for replication was that RPO of zero.

 

Speaker:

But it did come with other op it, it came with other.

 

Speaker:

Complications that you had to, to deal with.

 

Speaker:

It's like, I often say in IT, we never fix problems.

 

Speaker:

We just move them.

 

Speaker:

Right.

 

Speaker:

So, so we, we solved one problem.

 

Speaker:

We created, we created some others.

 

Speaker:

So the other thing I want to talk about is how the.

 

Speaker:

The, how the data was stored on the target end.

 

Speaker:

There are two ways, as I understand it, that data was stored on the other end.

 

Speaker:

There were sort of two ways that the recovery system manifested itself.

 

Speaker:

One was that there was a volume that we were continuously updating so that if you.

 

Speaker:

needed to do a recovery, that volume was already replicated to the point in time,

 

Speaker:

the most recent point in time that you wanted to restore to, and then it also

 

Speaker:

had a log and the ability to undo that.

 

Speaker:

Uh, that volume, undo the changes to that volume so that you could take this,

 

Speaker:

this LUN, right, bring it back in time.

 

Speaker:

That was the one thing that was, that was really cool.

 

Speaker:

The, the advantage to that method was that if what you wanted was

 

Speaker:

right now, you had it immediately.

 

Speaker:

If you wanted to go back a little bit earlier and the farther back you wanted

 

Speaker:

to go, the more work had to be done.

 

Speaker:

And so the longer the recovery took, but that, uh, was the

 

Speaker:

primary, I think that was the most common way CDP manifested itself,

 

Speaker:

Yeah.

 

Speaker:

And I think, like you mentioned, that's a great...

 

Speaker:

opportunity because most of the times you're probably recovering to the latest

 

Speaker:

or somewhere near the latest point in time, rather than, hey, I need to go back.

 

Speaker:

Let me restore all my data from three weeks ago and now replay all

 

Speaker:

my backups going forward, which leads to a much longer time to recover.

 

Speaker:

Exactly.

 

Speaker:

There was this other way where they didn't create the volume that there was no volume

 

Speaker:

that they were continuously updating.

 

Speaker:

They essentially had all of the bits necessary to create

 

Speaker:

the volume at any time.

 

Speaker:

And then, I, I, this feels very NetApp y, and, and, right, although, and by that

 

Speaker:

I don't mean this is the way NetApp did it, it's just, you know, the way with

 

Speaker:

NetApp is, is a given snapshot is really just a bunch of pointers to blocks at

 

Speaker:

a particular point in time, right, and it's so, when you restore a volume To

 

Speaker:

a particular point in time, all you're doing is moving all the snapshots.

 

Speaker:

What they're doing is they have all of the bits and pieces that are necessary to

 

Speaker:

represent the volume at any point in time.

 

Speaker:

And then you,

 

Speaker:

Stitch it

 

Speaker:

you just had to create all the pointers, right?

 

Speaker:

There was no, there wasn't to restore so much as there was this,

 

Speaker:

I don't know what to call it.

 

Speaker:

It's unlike anything I've ever seen.

 

Speaker:

I really need a whiteboard.

 

Speaker:

I think too.

 

Speaker:

To illustrate this method, the real advantage of this was that

 

Speaker:

the recovery time was always the same regardless of whether or not

 

Speaker:

you wanted to go to the most recent point in time or three weeks ago.

 

Speaker:

because you're just at that point, just manipulating pointers and metadata and

 

Speaker:

not actually copying and restoring data.

 

Speaker:

And I know that some of those systems They also, we started talking, we started

 

Speaker:

using this term copy data management when we started talking about some of

 

Speaker:

the systems because they could say, hey, here's this, here's this volume from

 

Speaker:

this point in time and from this point in time and from this point in time, and

 

Speaker:

you can have all three of them at the same time because you could not do that.

 

Speaker:

That was the other feature.

 

Speaker:

Of the other method.

 

Speaker:

You could not have the same volume at multiple points in time.

 

Speaker:

This method allows you to have as many, no, you think you could.

 

Speaker:

There are ways with newer technologies to get that.

 

Speaker:

One method.

 

Speaker:

Some vendors used was to update that copy, take a snapshot

 

Speaker:

and present the snapshot out.

 

Speaker:

Right, so that's one method.

 

Speaker:

Now, not always the most optimal, but yeah,

 

Speaker:

right.

 

Speaker:

Yeah.

 

Speaker:

I was just thinking that like a single volume can't be presented at multiple

 

Speaker:

points in time, but you're right.

 

Speaker:

If you do a snapshot, then yes, you could do, you could do exactly that.

 

Speaker:

but it's more management

 

Speaker:

did it like that?

 

Speaker:

I said, I wonder what vendor was really good at doing snapshots.

 

Speaker:

So CDP, Continuous Data Protection, is the system that allows you to have an RPO

 

Speaker:

and an RTO of zero without the risk that you have with replication, where, where

 

Speaker:

if you have the, something bad happening to your primary data, Uh, from a logical

 

Speaker:

basis, you, you drop a table, you do something stupid, you get a cyber attack

 

Speaker:

that it gives you that power that you had with replication, but it also gives you

 

Speaker:

the power to be able to go back in time.

 

Speaker:

So it gives you basically the best of both worlds.

 

Speaker:

It gives you an infinite number of recovery points, but an

 

Speaker:

infinite turns out might not.

 

Speaker:

I like it.

 

Speaker:

Anybody living space.

 

Speaker:

Yeah, that exactly.

 

Speaker:

See, you know where I was going.

 

Speaker:

yeah, exactly.

 

Speaker:

Uh, but it.

 

Speaker:

It turns out that infinite is, is not as amazing as it seems.

 

Speaker:

Infinite number of recovery points comes with its own challenges, but the biggest

 

Speaker:

challenge I think with CDP is just cost.

 

Speaker:

That very few people were comfortable with The cost of using CDP as their only data

 

Speaker:

protection method for a given set of data.

 

Speaker:

And so they would, what you would most commonly see is we're only going to

 

Speaker:

use it for our most critical apps.

 

Speaker:

Or we're going to use it, but we're also going to use a traditional backup.

 

Speaker:

Because that's what we're, I don't know, I don't know about you, but I,

 

Speaker:

I'm always on the lookout for something that can do, that can give me everything

 

Speaker:

that I want, give me that long term retention to be able to go back when

 

Speaker:

I realized that I did something stupid three months ago and also have an

 

Speaker:

RPO and an RTO of, of close to zero.

 

Speaker:

on a unicorn.

 

Speaker:

I want a unicorn, but there are, there are ways, and we're going to talk about

 

Speaker:

some of those ways, to give you an RPO and an RTO way better than what we

 

Speaker:

traditionally had without perhaps the cost and the, the downsides and the

 

Speaker:

logistical challenges that CDP offered.

 

Speaker:

I think in the end it cut, there are CDP products, and for certain

 

Speaker:

applications, for certain environments, It's like the way to do it, right?

 

Speaker:

It's just, I think what you're seeing is the complexities of this

 

Speaker:

and the costs associated with this.

 

Speaker:

This is why it's still a niche play.

 

Speaker:

And that's why there's only a handful of these products available out there.

 

Speaker:

What do you

 

Speaker:

I agree.

 

Speaker:

Yes.

 

Speaker:

All right.

 

Speaker:

Well, hopefully, uh, for those of you that have always wondered what CDP is, now

 

Speaker:

you know, and now you know why it didn't solve all problems in data protection.

 

Speaker:

But I would just say, if you want an RPO and an RTO of zero, and you don't want to

 

Speaker:

have the issue with replication, right?

 

Speaker:

Which means, right, we've already talked about the, if you don't want to have

 

Speaker:

the issues that replication causes, then Uh, CDP is really the only game in town.

 

Speaker:

So hopefully this honest assessment of CDP will allow the very small

 

Speaker:

percentage of you that need it to know that sounds exactly like what I need.

 

Speaker:

Uh, and with that, that's a wrap.

 

Speaker:

The backup wrap up is a production of backup central.com where you'll find my

 

Speaker:

blog and a list of services I can provide.

 

Speaker:

This is an independent podcast.

 

Speaker:

And any opinions that you hear are those of the speaker and not necessarily

 

Speaker:

any companies that they work for.

 

Speaker:

We'll see you next week on the backup wrap up.