Oct. 23, 2023

Is it a backup or just a copy?

In this episode of the Backup Wrap-Up, host W. Curtis Preston discusses the importance of distinguishing between a copy and a backup to ensure the protection of valuable data. He also explores key backup concepts such as multiplexing, incremental backups, block-level incremental backups, and source-side deduplication. The episode kicks off with a discussion on the recent MGM hack, highlighting the significant impact it had on the hotel chain and the potential for personal information leaks. Tune in to learn how to safeguard your data effectively and become a backup hero.

Articles mentioned in the episode:

https://www.reddit.com/r/vegas/comments/16hxwj0/explain_like_i_am_5_mgm_hacking/

https://www.reversinglabs.com/blog/what-we-know-about-blackcat-and-the-mgm-hack

Speaker: 00:00:00

There are dozens of things that people do to protect their data

Speaker: 00:00:03

from loss, but many of them are worthless when you actually need them.

Speaker: 00:00:07

In this episode, we'll learn what turns a copy into a backup so that

Speaker: 00:00:13

you can make sure that anything you think is a backup actually is one.

Speaker: 00:00:17

We'll also talk about some important backup concepts like

Speaker: 00:00:20

multiplexing, incremental backups, block-level incremental backups

Speaker: 00:00:25

and source side deduplication.

Speaker: 00:00:27

Hi, I'm W.

Speaker: 00:00:29

Curtis Preston, AKA Mister backup.

Speaker: 00:00:31

And I've been specializing in backup and disaster recovery for over 30 years.

Speaker: 00:00:36

My podcast turns unappreciated backup admins into cyber recovery heroes.

Speaker: 00:00:41

This is the backup wrap up.

Speaker: 00:01:04

. Hi and welcome to the show.

Speaker: 00:01:05

and I have with me a guy who I think is super jelly of the new

Speaker: 00:01:10

toy that I put in yesterday.

Speaker: 00:01:12

Prasanna Malaiyandi.

Speaker: 00:01:14

How's it going, Prasanna?

Speaker: 00:01:15

I'm good Curtis, and yes, I am jealous of your toy, but I will have to start

Speaker: 00:01:21

sending you a bill for consulting fees, when you inevitably, or if

Speaker: 00:01:26

you inevitably run into issues.

Speaker: 00:01:29

Huh?

Speaker: 00:01:30

Yeah, because as you recall, I didn't, I purchased this like without even talking

Speaker: 00:01:35

to you, which is rather atypical of me.

Speaker: 00:01:38

because I, we talk about so much and, and basically what we're talking about

Speaker: 00:01:43

is a Firewalla, which I've had my eye on for a while and then after I realized,

Speaker: 00:01:49

so I've switched internet service providers and now they're telling me

Speaker: 00:01:53

that I'm hitting the bandwidth limit already and which is highly possible

Speaker: 00:01:56

given that I do, this, I realized that I had no bandwidth monitoring tools.

Speaker: 00:02:01

I have this really nice, mesh router system.

Speaker: 00:02:04

Wire, the wire, the wifi mesh, but that's put been put an access

Speaker: 00:02:11

point mode, which then of course offers no bandwidth monitoring.

Speaker: 00:02:15

And then I had this Cox router, which offered me nothing.

Speaker: 00:02:19

And so I replaced the Cox router with the Firewallet Purple SE and man.

Speaker: 00:02:26

Super simple to put in there.

Speaker: 00:02:28

And now I have these like super, stats.

Speaker: 00:02:31

And I get these, I'm going to, at some point I'm going to have to disable

Speaker: 00:02:35

the notifications because it's like Curtis is playing games on his phone.

Speaker: 00:02:42

Curtis is watching YouTube videos on MacBook Pro A, right?

Speaker: 00:02:49

It's literally, it's like Curtis has downloaded 3.

Speaker: 00:02:52

56 gigabytes of video on his, and I'm like, okay, this

Speaker: 00:02:57

is going to get old pretty

Speaker: 00:02:58

I have two questions for you.

Speaker: 00:03:01

Yeah,

Speaker: 00:03:01

The first question is, Did you figure out what was consuming

Speaker: 00:03:05

your data cap or data usage cap?

Speaker: 00:03:08

not yet.

Speaker: 00:03:09

Cause it's only, it hasn't even been 24 hours, but I do have a pretty good guess

Speaker: 00:03:14

and I think it's right in front of me.

Speaker: 00:03:19

we'll see what is weird.

Speaker: 00:03:21

And again, I didn't want to talk about this too much, but what is weird is I

Speaker: 00:03:24

get these, some of the notifications it's, Mac book pro uploaded 3.

Speaker: 00:03:31

5 megabytes of data to LinkedIn.

Speaker: 00:03:34

At 3 45 a.

Speaker: 00:03:36

And I'm like, what, like, why is my laptop uploading three and a half

Speaker: 00:03:42

megabytes of anything while it's just sitting here and I'm sleeping somewhere?

Speaker: 00:03:49

that's weird.

Speaker: 00:03:51

weird.

Speaker: 00:03:52

Yeah.

Speaker: 00:03:52

So anyway, so yeah, go ahead.

Speaker: 00:03:54

Yeah.

Speaker: 00:03:56

since you've decided to get rid of the Cox router, can you not just

Speaker: 00:04:00

use your Wi Fi mesh as a router?

Speaker: 00:04:04

I could have, but then I would have to completely redo my

Speaker: 00:04:07

network architecture, which as you recall, was a really big thing.

Speaker: 00:04:12

That's true.

Speaker: 00:04:13

And I actually really liked the firewall features of this.

Speaker: 00:04:16

That's, that was what would, what really drew me to it.

Speaker: 00:04:18

And I'd been thinking about it and this was that final excuse to get it.

Speaker: 00:04:22

and, I'm really enjoying the security aspects

Speaker: 00:04:25

glad.

Speaker: 00:04:26

So there, for people, if Mr.

Speaker: 00:04:28

Backup can learn networking and firewalls, so can you.

Speaker: 00:04:34

Yeah.

Speaker: 00:04:34

It's certainly not my, my forte.

Speaker: 00:04:37

but I want to talk, it's time for the news of the week.

Speaker: 00:04:42

, the big news, I think, of the entire IG world, everybody seems to be

Speaker: 00:04:47

talking about it, is this MGM hack.

Speaker: 00:04:51

I can't, can you imagine?

Speaker: 00:04:53

if you've been living in a hole, They shut down MGM and Caesars and all

Speaker: 00:04:58

of the hotels attached to MGM and Caesars, which is like half the strip.

Speaker: 00:05:03

And they shut down like card keys, slot machines,

Speaker: 00:05:08

ATMs.

Speaker: 00:05:09

everything.

Speaker: 00:05:10

ATMs,

Speaker: 00:05:12

So for, so for our listeners, Who may not know about this,

Speaker: 00:05:15

MGM is a hotel chain, right?

Speaker: 00:05:17

They have a bunch of things as well, right?

Speaker: 00:05:19

Like various hotels like Caesars and MGM, and they are in Las Vegas

Speaker: 00:05:24

and there are casinos, right?

Speaker: 00:05:26

So you can stay there, you can gamble there, right?

Speaker: 00:05:29

They make lots and lots and lots and lots of money.

Speaker: 00:05:32

but not in the last week or so,

Speaker: 00:05:35

And they got hit by a cyber attack, on the week of September

Speaker: 00:05:39

20th or so, I'm guessing.

Speaker: 00:05:41

yeah, and The I think that's the saddest part about this and by the way as of

Speaker: 00:05:49

today There's been half a dozen lawsuits attached because there's, threat of a,

Speaker: 00:05:57

PII leak, personal information leak.

Speaker: 00:05:59

And so there's been all sorts of worries about that.

Speaker: 00:06:03

So there's, a half a dozen, what do you call it?

Speaker: 00:06:05

Class action or lawsuits that are attempting.

Speaker: 00:06:08

To achieve class action status that have been filed.

Speaker: 00:06:11

I think the saddest part here and the way I like and we'll put the

Speaker: 00:06:16

link to this particular article in the show description, the

Speaker: 00:06:21

heading here targeting layer eight.

Speaker: 00:06:24

I've heard of the seven layer networking model again With my extensive

Speaker: 00:06:28

networking experience, the OSI model.

Speaker: 00:06:31

What is Layer 8,

Speaker: 00:06:34

People?

Speaker: 00:06:35

it's people.

Speaker: 00:06:35

Yes.

Speaker: 00:06:36

So layer 8 is people, which is probably the weakest part

Speaker: 00:06:42

of the entire stack, I'd say.

Speaker: 00:06:45

Yeah.

Speaker: 00:06:45

You are the weakest link!

Speaker: 00:06:50

Yeah, so what, how did they get in?

Speaker: 00:06:51

So they basically targeted an employee, right?

Speaker: 00:06:55

Who had the right level of access and They basically were able to gain access into

Speaker: 00:07:03

their Okta environment as a super admin.

Speaker: 00:07:06

But how did they do that?

Speaker: 00:07:08

That's the

Speaker: 00:07:08

Oh, so how did they do that?

Speaker: 00:07:10

They basically tripped, tricked their IT help desk?

Speaker: 00:07:14

That is so bad, right?

Speaker: 00:07:17

they somehow got...

Speaker: 00:07:19

Access to a privileged account, right?

Speaker: 00:07:21

according to the powers that be that they stole a password or they

Speaker: 00:07:25

hacked Active Directory somehow.

Speaker: 00:07:27

So they were able to attempt to log in, but they were stopped by MFA

Speaker: 00:07:32

which is a good thing, Okta, but then

Speaker: 00:07:35

They were able to convince the help desk that they were the person in

Speaker: 00:07:38

question and get them to reset MFA.

Speaker: 00:07:41

Now here's a question.

Speaker: 00:07:42

Do you think that employee is still there at the company?

Speaker: 00:07:47

And

Speaker: 00:07:47

is one of those

Speaker: 00:07:48

you be blaming the person

Speaker: 00:07:51

so I'm going to fast forward like 30 years.

Speaker: 00:07:54

Okay.

Speaker: 00:07:55

so 20, what would that be?

Speaker: 00:07:58

2053.

Speaker: 00:08:00

There's a guy he's going to be called Mr.

Speaker: 00:08:02

MFA and he's going to have a podcast dedicated to security because like my

Speaker: 00:08:09

career started with a screw up of this.

Speaker: 00:08:12

not quite this magnitude, but my career started with this.

Speaker: 00:08:16

And so I, My personal opinion, I don't know if this person is, has been fired.

Speaker: 00:08:23

I think they should only be fired if they didn't follow the processes that had been,

Speaker: 00:08:27

established and they

Speaker: 00:08:28

set out for them.

Speaker: 00:08:29

Potentially they should be disciplined.

Speaker: 00:08:31

I don't know if firing, if.

Speaker: 00:08:32

If termination is the appropriate, they should be disciplined.

Speaker: 00:08:35

If they followed the procedures that had been laid out for

Speaker: 00:08:38

them, process, people, right?

Speaker: 00:08:41

Then technology.

Speaker: 00:08:43

If they had been, we just had a podcast about that.

Speaker: 00:08:46

If they followed the procedures.

Speaker: 00:08:48

have been given to them.

Speaker: 00:08:49

Then I think some massive leniency, then you update your procedures,

Speaker: 00:08:54

et cetera, et cetera, et cetera.

Speaker: 00:08:55

I think of a massive, outage that was caused at a major, software

Speaker: 00:09:02

vendor that I worked with.

Speaker: 00:09:03

I'm trying to be, I'm trying to be very cagey here, where the backup

Speaker: 00:09:09

operator followed his Procedure that they had two parts of the app

Speaker: 00:09:16

that had to be shut down in order to do a backup because they couldn't

Speaker: 00:09:19

synchronize the two backup systems.

Speaker: 00:09:22

And so every two weeks they would shut down these apps and,

Speaker: 00:09:27

and then do a backup offline.

Speaker: 00:09:30

And this person.

Speaker: 00:09:31

The backup operator just did what they were told to do and shut down these

Speaker: 00:09:36

apps at the most critical time of the year when the apps were needed, right?

Speaker: 00:09:42

The person was just doing their job.

Speaker: 00:09:44

That person should not be fired.

Speaker: 00:09:45

That person should be You know, you changed the procedure.

Speaker: 00:09:49

So I don't know what happened here.

Speaker: 00:09:50

Yeah.

Speaker: 00:09:50

You train.

Speaker: 00:09:51

Yeah.

Speaker: 00:09:52

I hope some leniency was there.

Speaker: 00:09:55

if the person was fired and, I'd love to have them on the podcast.

Speaker: 00:10:01

But anyway, what could we learn from this, from this news here?

Speaker: 00:10:04

Prasanna?

Speaker: 00:10:05

basically that, one is even if you have the greatest technologies in

Speaker: 00:10:11

place and the greatest processes in place, people will always exist

Speaker: 00:10:18

never underestimate the power of people to do dumb things.

Speaker: 00:10:22

I do think that perhaps what's in order here is an update to process.

Speaker: 00:10:27

And the process should be when, cause you have to be able to reset MFA.

Speaker: 00:10:32

When resetting MFA, it should require many more, bells and whistles

Speaker: 00:10:36

and levels of authentication.

Speaker: 00:10:38

And we need to identify, we need to identify that this person who

Speaker: 00:10:42

calls in that says that they're Steve, we need a way to identify

Speaker: 00:10:45

that Steve is actually Steve.

Speaker: 00:10:48

so you create a process around that, that really verifies that someone who they are,

Speaker: 00:10:52

and especially,

Speaker: 00:10:53

you reset MFA.

Speaker: 00:10:54

and especially when it's someone of, with that level of privilege.

Speaker: 00:10:59

Especially, super especially, that's not a word, but yeah,

Speaker: 00:11:02

that, oh, I feel for these guys.

Speaker: 00:11:06

keep, abreast of, this story because it is going to get worse before it gets better.

Speaker: 00:11:13

And that's the news for this week.

Speaker: 00:11:18

So what I thought we would talk about this week and the backup to basic series is,

Speaker: 00:11:23

I've got it defined as, backup methods that support a traditional restore.

Speaker: 00:11:29

So basically the backup methods that I grew up with that are still,

Speaker: 00:11:33

Relevant.

Speaker: 00:11:34

Yeah.

Speaker: 00:11:34

in, yeah, right?

Speaker: 00:11:37

we like to live in a world where everybody's using the

Speaker: 00:11:39

latest and greatest, right?

Speaker: 00:11:42

And nobody's doing this old, full and incremental backups and stuff.

Speaker: 00:11:46

Nobody's doing that.

Speaker: 00:11:48

And that's just not right.

Speaker: 00:11:50

So we need to talk about these, these methods and see what we can get out there.

Speaker: 00:11:55

the first thing, I just have to, again, I'm, I'm, we're doing this based

Speaker: 00:12:00

on, my book, Modern Data Protection, There's a cover for those of you

Speaker: 00:12:04

watching via video, all, all three listeners that are watching via video.

Speaker: 00:12:10

I think it's 10.

Speaker: 00:12:12

There's maybe 10.

Speaker: 00:12:13

the number's actually gone up since we've been putting them on YouTube.

Speaker: 00:12:15

Oh, there you go.

Speaker: 00:12:18

the, I've got this thing in here.

Speaker: 00:12:21

so this is from chapter nine and talking about backup and

Speaker: 00:12:24

recovery software methods.

Speaker: 00:12:25

And the first thing I had in there was, is everything backup?

Speaker: 00:12:28

So there was a time when backup was well defined.

Speaker: 00:12:31

Backup was copy something to tape and then put that tape in a box,

Speaker: 00:12:35

right?

Speaker: 00:12:35

It was so simple back then.

Speaker: 00:12:37

Yeah, it was so simple back then.

Speaker: 00:12:39

Yes.

Speaker: 00:12:40

so I, as quote, Mr.

Speaker: 00:12:44

Backup, I see backup a lot broader than I think a lot of people do.

Speaker: 00:12:49

A lot of people, when they say backup, they go, Oh, this isn't backup.

Speaker: 00:12:51

This is, to me, backup is anything really that protects the data, the

Speaker: 00:12:56

way backup protects data, right?

Speaker: 00:12:57

And so I'm defining backup rather broadly as anything that is a copy of data

Speaker: 00:13:02

stored separately from the original.

Speaker: 00:13:04

that can be used to restore the original if it is damaged.

Speaker: 00:13:08

There's a lot of things that qualify for backup as backup under that

Speaker: 00:13:12

so let me just give you some examples and see if you think they qualify.

Speaker: 00:13:17

Okay.

Speaker: 00:13:17

So take a copy on tape,

Speaker: 00:13:20

Yes.

Speaker: 00:13:21

a copy in AWS S3.

Speaker: 00:13:23

A copy of the data that's in S3, which is separate from

Speaker: 00:13:28

The

Speaker: 00:13:28

your not.

Speaker: 00:13:30

Yeah.

Speaker: 00:13:30

yes.

Speaker: 00:13:31

okay?

Speaker: 00:13:31

a copy replicated from one storage system to another storage

Speaker: 00:13:36

system from the same vendor.

Speaker: 00:13:37

as long as...

Speaker: 00:13:39

there's a caveat here, because you used the word replication.

Speaker: 00:13:44

I need the ability, is it replicated in such a way that if

Speaker: 00:13:49

I damage production, so

Speaker: 00:13:51

that doesn't qualify as being stored separately.

Speaker: 00:13:54

Replicated with separate retention of the copies on the destination.

Speaker: 00:13:59

Okay.

Speaker: 00:14:00

Yes, I would call that a

Speaker: 00:14:01

Okay, snapshots on a production system, on a production storage

Speaker: 00:14:05

array that does not include AWS S3,

Speaker: 00:14:10

thank you for, yeah, so snapshots on the same array.

Speaker: 00:14:15

no, End of story, not a backup until it's copied somewhere

Speaker: 00:14:20

Okay, and then doing what you were recently doing when

Speaker: 00:14:23

editing the podcast, right?

Speaker: 00:14:25

Downloading a copy from the cloud onto your local system, copying it

Speaker: 00:14:29

to a different directory, and then copying it to yet a third directory.

Speaker: 00:14:34

On your local system.

Speaker: 00:14:35

Is that local system considered backups, each of those copies?

Speaker: 00:14:39

again, we're storing the data in a separate place that

Speaker: 00:14:42

has a separate risk profile.

Speaker: 00:14:45

Etc.

Speaker: 00:14:46

yes,

Speaker: 00:14:46

As long as the copy, the original

Speaker: 00:14:48

copy was in the, cloud.

Speaker: 00:14:49

it's also about, the purpose of why I'm doing it, right?

Speaker: 00:14:52

If the purpose of downloading that is to serve as possibly a backup, right?

Speaker: 00:14:59

Because there's a lot of times that we download data That

Speaker: 00:15:02

is not for backup purposes.

Speaker: 00:15:04

Now, it could accidentally become a backup if it's the only

Speaker: 00:15:07

copy that you have available.

Speaker: 00:15:08

But, just because I copy doesn't necessarily make it a backup.

Speaker: 00:15:12

It might be an archive.

Speaker: 00:15:13

And then the last example.

Speaker: 00:15:14

taking pictures on your iPhone and using iCloud to sync

Speaker: 00:15:19

your copies to iCloud photos.

Speaker: 00:15:23

Not a backup.

Speaker: 00:15:26

Because,

Speaker: 00:15:26

is that?

Speaker: 00:15:27

for two reasons.

Speaker: 00:15:29

One, which is really the primary.

Speaker: 00:15:30

And that is specifically in terms of Apple iCloud.

Speaker: 00:15:36

But the biggest thing is that it's synchronized.

Speaker: 00:15:39

that's the key.

Speaker: 00:15:40

That's, you're, you asked earlier, you delete a picture in your phone or some

Speaker: 00:15:46

app, delete, some like ransomware deletes a bunch of pictures in your phone.

Speaker: 00:15:50

It synchronizes that deletion up in the cloud and they go byebye, right?

Speaker: 00:15:54

It is a synchronized copy, not a backup.

Speaker: 00:15:57

it is stored separately, but if you delete it here and it gets deleted

Speaker: 00:16:02

there, that's not a backup, right?

Speaker: 00:16:04

Just like we were talking, before.

Speaker: 00:16:06

And that's one really important reason, possibly the most important reason.

Speaker: 00:16:12

But the other is that there's a feature in iPhone that...

Speaker: 00:16:17

It says we can store low res copies on the phone and the high res copies in

Speaker: 00:16:21

the cloud, which means that not only is it a synchronized copy, the only true

Speaker: 00:16:26

copy of your photo is in the cloud.

Speaker: 00:16:28

It's only one copy, which means you need to be backing up iCloud.

Speaker: 00:16:32

and by extension also, Google photos if you're an Android

Speaker: 00:16:36

person, so yeah, not a backup,

Speaker: 00:16:38

okay, no,

Speaker: 00:16:39

which we had a whole podcast episode about that.

Speaker: 00:16:41

How to properly back up your iCloud account.

Speaker: 00:16:44

yeah,

Speaker: 00:16:45

were good examples.

Speaker: 00:16:46

I think those are a lot of things, like you said, right?

Speaker: 00:16:48

It's not always easy to say, is it a backup or not?

Speaker: 00:16:51

Unless you dive into the next level of questions and ask, okay, is it really a

Speaker: 00:16:55

yeah,

Speaker: 00:16:56

Does it meet these

Speaker: 00:16:57

I think.

Speaker: 00:16:57

or not?

Speaker: 00:16:58

I think you did a good job of, the different categories, like that

Speaker: 00:17:02

thing of, if it's fully synchronized, whether synchronous or asynchronous,

Speaker: 00:17:06

if it's fully synchronized and if I delete the production and

Speaker: 00:17:09

it deletes the data, the copy,

Speaker: 00:17:12

that, That's not a backup.

Speaker: 00:17:13

right?

Speaker: 00:17:14

unless that copy has the ability to undo that.

Speaker: 00:17:18

If it does, then, I would change my answer, right?

Speaker: 00:17:20

And so like a NetApp synchronized filer, I would consider that

Speaker: 00:17:25

other copy, that would be backup,

Speaker: 00:17:28

other things that are not a backup, one that you didn't mention would be,

Speaker: 00:17:34

the recycle bin in your Microsoft 365.

Speaker: 00:17:37

That is not a backup, right?

Speaker: 00:17:39

It's not stored separately.

Speaker: 00:17:40

it's just, records in a database that have been flagged as deleted.

Speaker: 00:17:44

They haven't gone anywhere.

Speaker: 00:17:46

They're sitting right next to the production data.

Speaker: 00:17:48

So yeah,

Speaker: 00:17:49

Okay.

Speaker: 00:17:50

And then the other one is,

Speaker: 00:17:55

So in your opinion, does backup require you to always be able to go

Speaker: 00:18:03

back to a point in time that could plausibly have existed in the system?

Speaker: 00:18:12

And the reason I'm asking this is if I look at, I know email archiving comes

Speaker: 00:18:16

up a lot and sometimes people are like, oh, that's the same as backup.

Speaker: 00:18:20

But with email archive, you're just getting all the data that's there,

Speaker: 00:18:22

whether or not your mailbox actually looked like that, your inbox looked

Speaker: 00:18:26

like that or not at any point in time.

Speaker: 00:18:29

Yeah.

Speaker: 00:18:30

So backup.

Speaker: 00:18:34

requires restore, right?

Speaker: 00:18:37

For it to be a backup, you need to be able to restore it to the way it

Speaker: 00:18:41

looked at some point in time, right?

Speaker: 00:18:45

yeah, that's a really good question, Prasanna.

Speaker: 00:18:48

it's one thing to say a file, but, if you cannot, if you cannot bring

Speaker: 00:18:54

the thing that's been damaged back to its You know, back to before it

Speaker: 00:19:01

was damaged and that it comes back to the same way as it was before it was

Speaker: 00:19:06

damaged, then you don't have a backup.

Speaker: 00:19:09

You copy of the data, right?

Speaker: 00:19:12

And an email archive is a perfect example of that.

Speaker: 00:19:15

You have a copy of the data, but it was stored for a different purpose.

Speaker: 00:19:18

It was stored for archive, which means it wasn't designed to be put back into the,

Speaker: 00:19:25

the state it was in,

Speaker: 00:19:27

yeah, the state that it was in,

Speaker: 00:19:28

right?

Speaker: 00:19:28

so you might be able to restore all the email, but you won't be able to

Speaker: 00:19:31

restore folders and things like that.

Speaker: 00:19:33

A good backup should bring the thing back to the way it was before it was damaged,

Speaker: 00:19:38

however it let's go back to a time when tape drive started getting, so here,

Speaker: 00:19:43

we're going to talk about a feature that is now for many people, passe, right?

Speaker: 00:19:49

it's not really necessary because they no longer use tape as their primary target

Speaker: 00:19:54

or their initial target of backups.

Speaker: 00:19:57

and that is this concept of multiplexing.

Speaker: 00:19:59

And it goes back to, there was a time when we

Speaker: 00:20:03

Way back in the days.

Speaker: 00:20:05

right back in the day.

Speaker: 00:20:07

So multiplexing, do you want to define multiplexing or explain it?

Speaker: 00:20:11

Yeah, multiplexing.

Speaker: 00:20:12

Yeah, I, let me attempt to, I know I wasn't aware of this before we started

Speaker: 00:20:16

doing the podcast and you explained everything about tape and I know we've

Speaker: 00:20:20

had a bunch of folks, tape experts on the podcast as well, but multiplexing is...

Speaker: 00:20:26

to solve an issue where tape requires you to write at a certain speed.

Speaker: 00:20:34

If you don't, it's bad.

Speaker: 00:20:36

And tapes got faster and faster, but the problem was pumping data into the tape

Speaker: 00:20:39

device itself wasn't going as quickly as the tape speeds were increasing.

Speaker: 00:20:45

And so in order to solve that, what they decided to do was say, okay, Let's have

Speaker: 00:20:50

multiple clients feed data into the tape device at the same time, and we will

Speaker: 00:20:55

multiplex or basically write all those streams into the tape drive at the same

Speaker: 00:20:58

time, keeping the tape device happy.

Speaker: 00:21:01

While still being able to do all the backups.

Speaker: 00:21:04

Yeah, another word for it would be interleaving.

Speaker: 00:21:06

You did great.

Speaker: 00:21:07

basically putting all, chopping them up into pieces and then

Speaker: 00:21:10

putting together into one, turning a bunch of streams into one stream.

Speaker: 00:21:14

And when we first started, we used multiplexing settings of four

Speaker: 00:21:20

Which means four different

Speaker: 00:21:21

turn and.

Speaker: 00:21:22

Yeah, four different clients being combined into a stream to

Speaker: 00:21:25

make a tape drive happy, but tape drives got faster and faster.

Speaker: 00:21:29

The clients didn't get faster.

Speaker: 00:21:31

And so by the time I left, by the time I used my last tape drive in

Speaker: 00:21:36

production, we were up to 36, right?

Speaker: 00:21:39

We were up to 36 streams together to, to make an individual tape drive happy.

Speaker: 00:21:45

And the reason,

Speaker: 00:21:47

I was gonna ask why.

Speaker: 00:21:48

Yeah.

Speaker: 00:21:49

Why were clients not fast enough

Speaker: 00:21:51

yeah.

Speaker: 00:21:52

So the reason that this was bad is that, what, why is the only reason we back up,

Speaker: 00:22:00

to restore

Speaker: 00:22:01

right?

Speaker: 00:22:02

So when you

Speaker: 00:22:02

go to

Speaker: 00:22:03

do a restore,

Speaker: 00:22:04

Yeah.

Speaker: 00:22:05

yeah.

Speaker: 00:22:05

When you go to do a restore, you have to read all 36 streams

Speaker: 00:22:10

and throw 35 of them away.

Speaker: 00:22:13

So your tape drive, the speed of your restore is going to be 1 35th.

Speaker: 00:22:20

Of what it could potentially be if it hadn't been multiplexed,

Speaker: 00:22:25

But if you're never doing restore tests, it doesn't really matter.

Speaker: 00:22:27

Until you actually need to restore the data.

Speaker: 00:22:31

yeah, if you're You're killing me you're killing me yeah, so it was one

Speaker: 00:22:36

of these things where it was a Cut your nose off to to spite your face, right?

Speaker: 00:22:44

So We felt that it was But it was a necessary evil.

Speaker: 00:22:50

We, you could only restore if you've got backups done and we could only get

Speaker: 00:22:54

backups done reliably if we were using multiplexing, but we knew that it was

Speaker: 00:22:59

creating this problem and ultimately this was the undoing of tape from

Speaker: 00:23:03

a backup and recovery perspective.

Speaker: 00:23:05

We switched to destaging and.

Speaker: 00:23:08

these other things to undo this, necessary evil.

Speaker: 00:23:11

But, it, it was a mess.

Speaker: 00:23:13

But that's what multiplexing is.

Speaker: 00:23:14

So if you've heard about multiplexing, you don't need to do multiplexing

Speaker: 00:23:18

if you're backing up to disk.

Speaker: 00:23:19

Because disk can write at whatever speed you tell it to write at.

Speaker: 00:23:23

And it can write a bunch of things at the same time.

Speaker: 00:23:26

And you can give it 36 streams and it can write them all at the same time in

Speaker: 00:23:30

separate places of the disk in such a way that when you go to do a restore,

Speaker: 00:23:33

you don't, you're not, you don't have to read all of them to read one of them.

Speaker: 00:23:39

What?

Speaker: 00:23:41

That was my yes.

Speaker: 00:23:44

disk is fast enough, but

Speaker: 00:23:48

Yeah.

Speaker: 00:23:49

Well, it's not,

Speaker: 00:23:50

a disk

Speaker: 00:23:50

drive has a certain

Speaker: 00:23:51

number of IOPS it could handle.

Speaker: 00:23:53

And therefore, as long as your system is big enough.

Speaker: 00:23:57

To handle all of them in peril.

Speaker: 00:23:58

yes.

Speaker: 00:23:59

they're, disk drives are not, Unlimited bandwidth, unlimited IO,

Speaker: 00:24:04

et cetera, et cetera, et cetera.

Speaker: 00:24:05

Yes.

Speaker: 00:24:06

but the point of the way that it lays the data, you don't have to lay the,

Speaker: 00:24:10

you can lay the data however you want and then read it however you want.

Speaker: 00:24:13

there are, again, there are limits to everything depending on how

Speaker: 00:24:18

much you fragment the data and all that kind of stuff, right?

Speaker: 00:24:20

But it's still way better than tape from that perspective.

Speaker: 00:24:24

All right, next one's a whole lot easier.

Speaker: 00:24:27

What comes next?

Speaker: 00:24:28

What's the first type of, what's

Speaker: 00:24:29

let you tackle

Speaker: 00:24:30

what, no, I'll let you tackle this, Curtis.

Speaker: 00:24:32

So what's the, what is it?

Speaker: 00:24:34

The first type of backup that everyone should cut their teeth on.

Speaker: 00:24:40

what a full backup?

Speaker: 00:24:41

Is that

Speaker: 00:24:42

Yeah.

Speaker: 00:24:43

what you're saying?

Speaker: 00:24:44

Yeah.

Speaker: 00:24:45

so basically we're just going to talk about this concept of

Speaker: 00:24:47

full and incremental backups.

Speaker: 00:24:49

And probably everybody knows this, but this is a backup to basic series.

Speaker: 00:24:55

So a full backup backs up everything, an incremental backup

Speaker: 00:24:59

backs up things that have changed.

Speaker: 00:25:02

And the, there are different types of incremental backups, right?

Speaker: 00:25:07

And different people have different names for these different types, right?

Speaker: 00:25:13

terms you've probably heard, incremental, differential, cumulative incremental.

Speaker: 00:25:18

For a lot of people, cumulative incremental and

Speaker: 00:25:21

differential are the same thing.

Speaker: 00:25:23

for people that got stuck in Windows land, not necessarily so what's the

Speaker: 00:25:29

difference between an incremental and these other two things?

Speaker: 00:25:32

A cumulative incremental.

Speaker: 00:25:34

So an incremental is basically, Typically, Sunday you do a full backup, right?

Speaker: 00:25:41

Monday you need to do another backup.

Speaker: 00:25:43

Now, you don't want to do necessarily the entire full backup again,

Speaker: 00:25:47

because maybe that's too much data, you don't have enough time, etc.

Speaker: 00:25:50

So you'll do an incremental, which is basically whatever has

Speaker: 00:25:54

changed since the last full.

Speaker: 00:25:56

So since Sunday.

Speaker: 00:25:57

Sorry, since the last time you did a backup, I should say.

Speaker: 00:25:59

exactly, whatever's changed since the last time you did a

Speaker: 00:26:02

Yeah, so in that case, it was Sunday, so then Monday you get the incrementals,

Speaker: 00:26:06

now Tuesday you're going to do backup, and so you do another incremental, which

Speaker: 00:26:09

is whatever has changed since Monday,

Speaker: 00:26:12

Exactly,

Speaker: 00:26:13

and we just keep doing that, right?

Speaker: 00:26:15

Yeah, and

Speaker: 00:26:17

then if it's, yeah, if it's Sunday, right?

Speaker: 00:26:20

And now it's Saturday, how many tapes do I need to do a restore?

Speaker: 00:26:25

do you need...

Speaker: 00:26:26

The previous Sunday, plus the Monday, plus the Tuesday, plus

Speaker: 00:26:29

the Wednesday, Thursday, Friday.

Speaker: 00:26:32

You basically need to replay

Speaker: 00:26:33

by the way, by the way, I really, I really channeled the old Curtis there.

Speaker: 00:26:37

I did it without even meaning to, I said tapes, right?

Speaker: 00:26:40

Cause

Speaker: 00:26:40

that was the problem back then.

Speaker: 00:26:42

We literally had to grab for seven tapes, right?

Speaker: 00:26:46

Nowadays, we don't have to grab for seven tapes, but,

Speaker: 00:26:48

but you still have to do all those restores though, right?

Speaker: 00:26:50

So even in the case of, if a file existed Sunday, and then was deleted Monday,

Speaker: 00:26:56

and then came back on Tuesday, you would still end up having to do all of those

Speaker: 00:27:01

data, like basically you're replaying like a log, all the data that would

Speaker: 00:27:05

have existed on each of those days.

Speaker: 00:27:08

right.

Speaker: 00:27:08

The real problem is a file that was changed every single day.

Speaker: 00:27:13

You would actually restore that file seven times.

Speaker: 00:27:16

It's a lot of wasted effort.

Speaker: 00:27:17

That's just the idea of a increment or regular incremental.

Speaker: 00:27:21

Then we have a differential or a cumulative incremental.

Speaker: 00:27:25

And the difference between that is that it's going to, it's going to do

Speaker: 00:27:27

the thing that you said earlier, which is it's going to back up everything

Speaker: 00:27:30

that's changed since the fall.

Speaker: 00:27:32

And so what some people do is that they've stopped, they stopped doing

Speaker: 00:27:36

incrementals and they switched to differentials or cumulative incrementals

Speaker: 00:27:41

every day, and that way at the end of the week, I would need at most two tapes.

Speaker: 00:27:46

Right now, this whole thing has pretty much gone away in the world of.

Speaker: 00:27:53

disk based backups, right?

Speaker: 00:27:55

Because the whole reason that we did backups this way, is

Speaker: 00:27:59

that, first off, let me back up.

Speaker: 00:28:01

We used to do weekly fulls followed by daily incrementals.

Speaker: 00:28:04

Then we switched for, because when we went to automated tape libraries,

Speaker: 00:28:10

the whole process of managing the different tapes wasn't as a big.

Speaker: 00:28:14

Big of a deal.

Speaker: 00:28:15

So we went to monthly folds followed by daily incrementals or maybe

Speaker: 00:28:18

a weekly cumulative and right?

Speaker: 00:28:21

So you'd still need a maximum of seven tapes to do a restore But when

Speaker: 00:28:25

we switched to this this whole thing just became Kind of silly and moot and

Speaker: 00:28:30

whatever and you could back up, however, you wanted to back up and dedupe,

Speaker: 00:28:34

which we're going to talk about in a minute, dedupe really changed the game.

Speaker: 00:28:39

And, because it didn't matter whether you backed up full or incremental or whatever,

Speaker: 00:28:43

you still stored the same amount of data.

Speaker: 00:28:45

Speaker: 00:28:46

before we jump though, one thing that I think people might also hear in addition

Speaker: 00:28:53

to fulls, incrementals, differentials, and cumulative incrementals is also levels.

Speaker: 00:29:00

So maybe you could talk about levels.

Speaker: 00:29:01

I know sometimes it's specific to like Oracle.

Speaker: 00:29:04

And some databases, but maybe it might

Speaker: 00:29:06

no, that's a good point.

Speaker: 00:29:06

Yeah, thanks.

Speaker: 00:29:08

so the concept of a backup level, literally, this goes

Speaker: 00:29:13

back to the days of dump, right?

Speaker: 00:29:16

which was the command to backup Unix file systems.

Speaker: 00:29:20

A level zero was a full, a level one, And if you wanted to do increment, if you

Speaker: 00:29:26

want to do what we call the incremental backups, the way we, you would do a zero

Speaker: 00:29:29

followed by a one, followed by a two, followed by a three, followed by a four.

Speaker: 00:29:34

And, it got interesting because if you then lowered the number.

Speaker: 00:29:40

It would behave like a,

Speaker: 00:29:41

cumulative incremental, right?

Speaker: 00:29:43

so like you could do a zero and then you do a one.

Speaker: 00:29:49

If you then did another one, if you kept doing ones, you would get a differential.

Speaker: 00:29:53

You would get a cumulative incremental every day.

Speaker: 00:29:56

If you did a 0, a 1, and then a 2, and then a 1 again, it's just, it

Speaker: 00:30:02

basically, it always pointed back to the number that was the most recent

Speaker: 00:30:08

number that was lower than itself, and so it got complicated, and so

Speaker: 00:30:12

there were actually some people that

Speaker: 00:30:14

Is it they prefer

Speaker: 00:30:15

called Towers of

Speaker: 00:30:16

Hanoi, Yeah, which is based on the game, and I've got it in the book,

Speaker: 00:30:22

the Towers of Hanoi progressive thing, but I can't, it's like 0, 3, 2, 4, so

Speaker: 00:30:30

basically every backup, without doing cumulative incrementals, every backup,

Speaker: 00:30:34

every file that was changed would end up being on two tapes, which was just

Speaker: 00:30:39

an interesting way to, To minimize tape, again, this is all because we're doing

Speaker: 00:30:43

tapes, but nobody has tapes anymore.

Speaker: 00:30:44

So nobody cares.

Speaker: 00:30:45

But that's what levels were.

Speaker: 00:30:46

It was all the way up to nine.

Speaker: 00:30:48

and they still have this concept in, in things like Oracle Backup.

Speaker: 00:30:53

So the next thing to talk about is this concept called file

Speaker: 00:30:56

level incremental forever.

Speaker: 00:30:58

And the company that really put this out there was IBM with their product TSM.

Speaker: 00:31:06

And back in the day,

Speaker: 00:31:07

has been renamed,

Speaker: 00:31:08

idea is you

Speaker: 00:31:08

do one full, what's that?

Speaker: 00:31:10

Hasn't it been renamed?

Speaker: 00:31:12

It has, but I'm just saying they came out with it when they

Speaker: 00:31:15

came out, it was called TSM.

Speaker: 00:31:17

It's now like IBM spectrum protect, but, the idea was you do one full and then

Speaker: 00:31:23

everything is an incremental forever.

Speaker: 00:31:25

we never again do a full and this really saved a lot of

Speaker: 00:31:30

bandwidth and saved a lot of tape.

Speaker: 00:31:32

It came with a mess and that was over time and again, tape over time,

Speaker: 00:31:41

you could end up needing hundreds of tapes to restore a single file system.

Speaker: 00:31:48

you would need just one file from this tape and one file from that tape.

Speaker: 00:31:51

And since the hardest part of a tape is like, it was like

Speaker: 00:31:54

two and a half minutes just to get a tape in and, get it loaded and seek

Speaker: 00:31:58

to So the average point in a tape.

Speaker: 00:32:01

So I was not a fan of doing backups this way when we were talking about tape.

Speaker: 00:32:09

Was there a reason?

Speaker: 00:32:11

what was the use case at the time for that?

Speaker: 00:32:13

it was about saving tape, saving

Speaker: 00:32:15

storage.

Speaker: 00:32:16

It was about saving bandwidth.

Speaker: 00:32:17

the idea, there's nothing wrong with the idea of incremental forever.

Speaker: 00:32:20

It's just that their implementation.

Speaker: 00:32:23

Back in the day when it was all tape, even when they had disk staging.

Speaker: 00:32:27

So they would stage the disk.

Speaker: 00:32:29

So they wouldn't multiplex, by the way, they wouldn't multiplex.

Speaker: 00:32:31

They would stage the disk and then they would, do the backups to tape.

Speaker: 00:32:36

And this only applied to file system backups.

Speaker: 00:32:39

It didn't apply to database backups.

Speaker: 00:32:41

And, but literally you would need hundreds and hundreds of tapes

Speaker: 00:32:46

to restore a single file system.

Speaker: 00:32:48

And it just, I was never a fan of doing backups that way.

Speaker: 00:32:52

As long as we were backing up to tape and they had ways to they had, co location

Speaker: 00:32:57

and these various, and this thing called reclamation, because when you're doing

Speaker: 00:33:01

backups that way, you end up with a lot of tapes that have files on them that have

Speaker: 00:33:07

expired that are no longer needed, but you have other files on there that are needed.

Speaker: 00:33:12

And so you'd have to copy forward.

Speaker: 00:33:15

Yeah.

Speaker: 00:33:15

so that you could reclaim that whole tape and then reuse it.

Speaker: 00:33:18

And

Speaker: 00:33:19

That sounds like a

Speaker: 00:33:20

management nightmare.

Speaker: 00:33:21

An interesting engineering problem, but...

Speaker: 00:33:24

yeah, I was never a fan of doing backups that way.

Speaker: 00:33:28

and I'm even less of a fan now that we don't have to worry about tape.

Speaker: 00:33:33

Now we can just do incremental forever and just do it without all that

Speaker: 00:33:35

co location and reclamation stuff.

Speaker: 00:33:37

Cause on disk, to reclaim, you just delete a file, right?

Speaker: 00:33:41

On tape, you delete a file in the middle of a tape.

Speaker: 00:33:43

You have to reclaim the tape.

Speaker: 00:33:45

so that's file level incremental forever.

Speaker: 00:33:49

And then, with the advent of backing up to disk, Which finally

Speaker: 00:33:54

happened, I don't know, 20 years ago.

Speaker: 00:33:58

It's so funny.

Speaker: 00:33:59

We, we say the advent of something that happened 20 years ago.

Speaker: 00:34:02

When we finally started doing it, and once everybody finally went to, and by

Speaker: 00:34:07

the way, everybody still is not backing up the desk, it's still, there's still

Speaker: 00:34:10

a small contingent of people to back up the tape, so those people will really

Speaker: 00:34:13

enjoy the first half of this episode.

Speaker: 00:34:16

Now we have this concept of block level incremental forever.

Speaker: 00:34:19

Would you like to explain that?

Speaker: 00:34:21

Yeah, with block level incremental, I guess where I think of block level

Speaker: 00:34:31

incremental, I know there's various places you can think about it, is

Speaker: 00:34:33

when it applies to virtual machines and other sort of larger objects.

Speaker: 00:34:39

where it doesn't make sense, to back up an entire VM, doing full, or, incremental

Speaker: 00:34:45

backups away, if you think about how you would have done file level backups, right?

Speaker: 00:34:50

Why would I

Speaker: 00:34:50

Now, what, why would that be?

Speaker: 00:34:52

because I have a file which represents a disk, the entire file

Speaker: 00:34:57

doesn't change every time, right?

Speaker: 00:34:59

Parts of the file

Speaker: 00:35:01

it's, so we're talking to a VMDK file or VDK, For, for,

Speaker: 00:35:06

Hyper V, VDDK, that can't say VDDK.

Speaker: 00:35:10

I think it's VDDK.

Speaker: 00:35:11

Yeah,

Speaker: 00:35:12

I think you're right.

Speaker: 00:35:13

so you're saying if anything changes on there,

Speaker: 00:35:16

you're backing up the entire whole

Speaker: 00:35:17

do an incremental, exactly.

Speaker: 00:35:19

You're going

Speaker: 00:35:19

it's the entire file change, right?

Speaker: 00:35:21

So you're backing up the entire thing, but that doesn't make sense when

Speaker: 00:35:23

you have files which are say 10, 50, 100, 200 gigabytes and you're backing

Speaker: 00:35:29

that up every single time and so with block level incrementals What they

Speaker: 00:35:34

basically have done is say, okay What blocks have changed in this VMDK?

Speaker: 00:35:41

Let me just back those up, right?

Speaker: 00:35:43

Oracle also for databases, they do something similar, right?

Speaker: 00:35:47

Where it's hey Let me only back up the blocks within an Oracle data

Speaker: 00:35:52

file that have changed rather than backing up the entire Oracle database.

Speaker: 00:35:57

And how does the backup product know which blocks have changed?

Speaker: 00:36:01

Usually you have to rely on that vendor to tell you.

Speaker: 00:36:05

So in the case of Oracle, right?

Speaker: 00:36:08

You're usually integrating with Oracle RMAN via SBT or some other

Speaker: 00:36:12

mechanism where Oracle knows, okay, I keep track of the database blocks.

Speaker: 00:36:16

I know which ones are new.

Speaker: 00:36:18

Here is a list of blocks that you need to care about.

Speaker: 00:36:20

Same thing with VMware, when you have their, what is their SDK called?

Speaker: 00:36:27

VADP.

Speaker: 00:36:28

Yeah.

Speaker: 00:36:29

they've changed

Speaker: 00:36:30

the name.

Speaker: 00:36:31

Yeah.

Speaker: 00:36:31

They've changed the name, but basically they're, they have an API to talk to,

Speaker: 00:36:36

and they maintain a bitmap, right?

Speaker: 00:36:39

And then they just give you, here's a map of the bits that you need to go get.

Speaker: 00:36:44

These are the bits that have changed.

Speaker: 00:36:46

They maintain that.

Speaker: 00:36:47

And then the, there's an API for asking for those blocks,

Speaker: 00:36:52

now this is great for disk based systems because if you think about these are

Speaker: 00:36:56

all random spots in a file and so you can dump it out now It's up to figure

Speaker: 00:37:02

out like how you want to do this and I know we'll talk a little bit later

Speaker: 00:37:06

about deduplicated storage, but In the case of Oracle, typically you would just

Speaker: 00:37:10

dump it out as incremental blocks, and just dump it into a file, and now you

Speaker: 00:37:14

have all those blocks captured together.

Speaker: 00:37:16

In the case of VMware, they started doing that.

Speaker: 00:37:19

A lot of back up vendors would just dump it out as raw blocks, which makes sense.

Speaker: 00:37:25

but then, there are other optimizations you can do to do smarter things with

Speaker: 00:37:30

it, because with incremental block based backups, you still have to

Speaker: 00:37:34

restore from multiple files in order to stitch together the final actual image.

Speaker: 00:37:41

Yeah.

Speaker: 00:37:41

And you still have that problem.

Speaker: 00:37:44

That we talked about earlier where you may restore an individual block multiple

Speaker: 00:37:48

times if it changes multiple times, right?

Speaker: 00:37:51

the advantage is it's incredibly efficient.

Speaker: 00:37:55

And the, like when we talk about backing up VMs, I agree with you.

Speaker: 00:38:00

That's where this really shines.

Speaker: 00:38:02

Because back in the day, if we backed up VMs, And we just pretended they were,

Speaker: 00:38:08

physical machines and we were running full and incremental backups on them.

Speaker: 00:38:12

We were beating the crap out of these VMs.

Speaker: 00:38:13

So this is much more IO friendly, to the VMs, right?

Speaker: 00:38:19

So it's much friendlier on the VMs.

Speaker: 00:38:21

That's why we want to talk to the VMware API and get just

Speaker: 00:38:25

the blocks that have changed.

Speaker: 00:38:27

And it doesn't really come with any major downside compared to.

Speaker: 00:38:32

The alternative is because we're storing the data on disk.

Speaker: 00:38:35

Can I

Speaker: 00:38:36

ask one

Speaker: 00:38:37

yeah,

Speaker: 00:38:37

sure.

Speaker: 00:38:38

So we've talked about using block level incrementals for VMware, for databases.

Speaker: 00:38:45

Is there a reason it hasn't really caught on for files?

Speaker: 00:38:50

Because if I take a file and kind of split it up into blocks, right?

Speaker: 00:38:56

Could I get the same benefit?

Speaker: 00:38:58

Or is there a reason that it makes a lot more sense for

Speaker: 00:39:00

like VMs or virtual machines?

Speaker: 00:39:04

the benefit will be relative to the size of the file, right?

Speaker: 00:39:08

The bigger the file, the bigger the benefit that you're going to get.

Speaker: 00:39:12

And I would say that the reason it hasn't caught on is because of the next

Speaker: 00:39:17

thing we're going to discuss, right?

Speaker: 00:39:19

That solved that problem.

Speaker: 00:39:21

But yeah, I think about like files like PST files or maybe a big access

Speaker: 00:39:27

database or backing up like MySQL.

Speaker: 00:39:29

That's not file.

Speaker: 00:39:30

I mean, it is a file, but it's, it's actually a database.

Speaker: 00:39:32

Right.

Speaker: 00:39:33

I'd say the reason they didn't put a lot of effort is deduplication, which,

Speaker: 00:39:40

why don't we just talk about that now?

Speaker: 00:39:42

I know we've covered dedupe, just really quickly for those that don't

Speaker: 00:39:45

understand what dedupe is, the idea is that we're going to identify duplicate

Speaker: 00:39:50

segments of the data, and duplicate means that we've seen this data before.

Speaker: 00:39:58

we've done a full backup or we've done an incremental backup and we've

Speaker: 00:40:02

seen this part of the data before.

Speaker: 00:40:05

And for it to be truly considered ddu, you've gotta look at,

Speaker: 00:40:09

it's gotta be subfile, right?

Speaker: 00:40:10

It's gotta be part of, like we were talking about the V M D K

Speaker: 00:40:14

or the V D D K or a P S T file.

Speaker: 00:40:17

We've gotta be looking inside the file, slicing that up into chunks,

Speaker: 00:40:21

and then deciding this chunk.

Speaker: 00:40:22

We've seen it before, this chunk, we have not, And so there are two

Speaker: 00:40:26

different places that dedupe happens.

Speaker: 00:40:28

One is at the target, which is, like a box, like a data domain

Speaker: 00:40:33

or a quantum box or ExaGrid.

Speaker: 00:40:35

these boxes are target dedupe.

Speaker: 00:40:39

And then there's this thing called source dedupe, which.

Speaker: 00:40:42

really took off from a company that was called Avamar.

Speaker: 00:40:46

That company got sold to EMC, which I know you spent a little

Speaker: 00:40:49

time with, back in the day.

Speaker: 00:40:51

And, both of our previous employer did a source side deduplication.

Speaker: 00:40:56

Yeah, so with the target site is great because you could take it and

Speaker: 00:41:01

plug it in and place anywhere, right?

Speaker: 00:41:03

Because as long as it supports whatever the protocol your client is using, right?

Speaker: 00:41:09

You could just ingest the data and you get all the benefits of deduplication.

Speaker: 00:41:12

So data domain was.

Speaker: 00:41:15

Very popular initially for in virtual tape libraries, right?

Speaker: 00:41:19

So you had tapes, right?

Speaker: 00:41:21

People are constantly doing fulls and incremental backups.

Speaker: 00:41:23

That's perfect to deduplicate.

Speaker: 00:41:25

you plug in a data domain, it emulates the tape interface.

Speaker: 00:41:29

And now you just, your clients still continue writing to there and then all

Speaker: 00:41:32

your data gets deduplicated, right?

Speaker: 00:41:34

And so it doesn't matter if it's NFS or if it's SMB or if it's tape, right?

Speaker: 00:41:39

It just works.

Speaker: 00:41:41

yeah, it's like that firewalla box that I

Speaker: 00:41:43

bought, right?

Speaker: 00:41:44

It just, it just, it goes in and then it just works, right?

Speaker: 00:41:47

You didn't have to change anything.

Speaker: 00:41:49

With source dedupe, the idea is that, there's three parts

Speaker: 00:41:52

of the deduplication process.

Speaker: 00:41:54

There's the slicing and dicing, right?

Speaker: 00:41:57

There's the creation of a hash.

Speaker: 00:41:58

You run the chunk of data through Some sort of cryptographic algorithm, like SHA,

Speaker: 00:42:04

something, and then that gives you a value

Speaker: 00:42:08

and then that value, you have to look up that value in some

Speaker: 00:42:11

sort of hash table, right?

Speaker: 00:42:13

with target deduplication, all three of those actions happen on the

Speaker: 00:42:17

target, which is why it works so well.

Speaker: 00:42:19

You just send the backups the way you're used to sending them,

Speaker: 00:42:21

and then it does the magic.

Speaker: 00:42:23

It slices and dices, it hashes, and it does the lookup, and it figures out which

Speaker: 00:42:26

chunks of data are new based on that hash.

Speaker: 00:42:30

Source side, the first two happen on the source, right?

Speaker: 00:42:34

We slice up the data before we back it up.

Speaker: 00:42:36

We slice up the data We create a hash of the data, and then we ask

Speaker: 00:42:41

some magic person in the cloud, has this hash been seen before?

Speaker: 00:42:46

And the decision is made on the other end.

Speaker: 00:42:50

Yes, we've seen this, or we haven't seen this, and then we

Speaker: 00:42:53

send Or don't send the data.

Speaker: 00:42:58

To me, source dedupe is much more efficient than target dedupe.

Speaker: 00:43:03

The difficulty is that it is a much, it's a little bit baby in

Speaker: 00:43:08

a bathwater situation, right?

Speaker: 00:43:10

Because in order to get it.

Speaker: 00:43:11

You've got to do a forklift upgrade.

Speaker: 00:43:13

You've got to stop using, let's say, again, this is things have changed, but

Speaker: 00:43:19

back in the day, you had to stop using NetBackup and start using Avamar, right?

Speaker: 00:43:24

Stop using Networker or TSM and switch to, Druva, right?

Speaker: 00:43:28

You had to change your backup product to get this done.

Speaker: 00:43:32

Things change a little bit over time, right?

Speaker: 00:43:34

A lot of these products now support source dedupe.

Speaker: 00:43:38

But that was the main downside or still is the main downside.

Speaker: 00:43:41

If you want source dedupe, you've got to change your backup product,

Speaker: 00:43:45

uh, or you've got to change how you use your backup product,

Speaker: 00:43:49

assuming it starts supporting

Speaker: 00:43:50

yeah, and I would say at this point, probably a good chunk of products

Speaker: 00:43:55

either have their own source ID deduplication mechanism or they

Speaker: 00:44:00

work with deduplicated targets which allow for source ID deduplication.

Speaker: 00:44:04

for instance, integrating with ExaGrid or Data Domain from like TSM, Veeam,

Speaker: 00:44:11

Exactly.

Speaker: 00:44:12

Yeah, there are some that criticize it saying that, the slicing and

Speaker: 00:44:18

dicing and the creation of the hash puts a load on the client.

Speaker: 00:44:21

I have always argued that if done properly, that load created by the

Speaker: 00:44:26

slicing and dicing and hashing is offset by the significant reduction

Speaker: 00:44:31

of the load of transporting or not transporting 99 percent of the data,

Speaker: 00:44:36

right?

Speaker: 00:44:37

Yeah.

Speaker: 00:44:38

Other critiques of it have been that the restore speed wasn't great because of

Speaker: 00:44:43

how the data was stored on the other end.

Speaker: 00:44:45

And I would argue that's a implementation problem.

Speaker: 00:44:48

it's not a problem with the concept.

Speaker: 00:44:49

It's a problem with the implementation of the

Speaker: 00:44:51

And then the other thing to also mention about source side deduplication is

Speaker: 00:44:55

typically these are also using proprietary protocols, so you don't end up with a

Speaker: 00:44:58

lot of security issues you have around, say, having a target dedupe appliance

Speaker: 00:45:02

with NFS or SMB open to the world.

Speaker: 00:45:07

Yep.

Speaker: 00:45:07

Yep.

Speaker: 00:45:07

Agreed.

Speaker: 00:45:08

Yes.

Speaker: 00:45:08

Agreed that there is a security advantage to having the data sliced

Speaker: 00:45:13

and diced way before and then encrypted before you send it to the other

Speaker: 00:45:17

system instead of doing it over an unsecured protocol like NFS or SMB.

Speaker: 00:45:21

Exactly.

Speaker: 00:45:22

All right.

Speaker: 00:45:23

this episode, I think, got a little longer than we had intended for it to

Speaker: 00:45:26

get, but we covered a lot.

Speaker: 00:45:29

We covered a lot in this episode.

Speaker: 00:45:31

so basically, we learned about.

Speaker: 00:45:33

what is and is not a backup.

Speaker: 00:45:35

We learned about, multiplexing, full and incremental backups,

Speaker: 00:45:38

file level incremental backups, and source side deduplication.

Speaker: 00:45:43

Uh, it's a big episode.

Speaker: 00:45:45

what do you think?

Speaker: 00:45:46

Yeah, no, that covers a lot of what everyone talks about when you...

Speaker: 00:45:51

Do you ever refer to backup and restore, You gotta know these backup

Speaker: 00:45:54

technologies in order to be able to restore and protect your company.

Speaker: 00:45:57

These are things that you need to know.

Speaker: 00:46:00

All right.

Speaker: 00:46:00

And with that, I once again want to thank our listeners.

Speaker: 00:46:05

you are why we do this in Prasanna.

Speaker: 00:46:07

Once again.

Speaker: 00:46:08

great at your insights and questions as well.

Speaker: 00:46:11

Thank you, sir.

Speaker: 00:46:12

Thank you, sir.

Speaker: 00:46:14

Keeping me honest.

Speaker: 00:46:15

And, remember this show, the backup wrap up is an independent podcast and

Speaker: 00:46:20

the opinions that you hear are ours.

Speaker: 00:46:22

Not anyone else's, and also this is a production of BackupCentral.

Speaker: 00:46:26

com and, uh, produced and edited by yours truly.

Speaker: 00:46:30

And I just want to say, that's a wrap.

Is it a backup or just a copy?

Listen On

Backup to Basics Episodes

Recent Episodes

Ransomware Episodes

Backup to Basics Episodes

Cloud Recovery Episodes

Sponsored Episodes

Cybersecurity Episodes

Browse episodes by category