Check out our companion blog!
March 20, 2023

What is deduplication and how does it work? (Backup to Basics series)

What is deduplication and how does it work?  (Backup to Basics series)

In our latest episode of the Backup to Basics series, we talk about what I think is the most important invention in my career: deduplication. Without dedupe, much of what we do in backup and recovery, and disaster recovery, would simply not be possible. Without dedupe there really is no disk backup market; there is no cloud backup market. I'd be out of a job! What is dedupe, anyway, and how does it work? What are the different kinds of dedupe and does that matter? You should learn a lot about this important topic.

Mentioned in this episode:

Interview ad

Transcript
Speaker:

On this episode of restore it all we talk about what I think is the

Speaker:

biggest advancement in backup and recovery technology during my career.

Speaker:

And that's deduplication.

Speaker:

I hope you enjoy the episode.

W. Curtis Preston:

Hi, and welcome to Backup Central's Restored all podcast.

W. Curtis Preston:

I'm your host, w Curtis Preston, aka Mr.

W. Curtis Preston:

Backup.

W. Curtis Preston:

And a half with me, my network, rearchitect Rearchitect, engineer.

Prasanna Malaiyandi:

Hey, Curtis, whatever I could do to keep you safe, you know?

W. Curtis Preston:

You know what's really funny is like I, I consider myself a

W. Curtis Preston:

pretty tech savvy guy, and when we were talking today, About what I'm, you know

W. Curtis Preston:

how I've, I've replaced a bunch of gear and I'm swapping out some stuff and

W. Curtis Preston:

moving some cables around, and then you were like, you were yelling at me.

W. Curtis Preston:

You were like, you can't do that.

W. Curtis Preston:

You can't put the switch on the thing.

W. Curtis Preston:

And I was like, yeah, I can, like, what are you talking about?

W. Curtis Preston:

And it, and it took me like a couple of seconds and I was like, oh, wait.

W. Curtis Preston:

You're right.

W. Curtis Preston:

I can't, that's not, I can't do that.

W. Curtis Preston:

I can't put.

W. Curtis Preston:

The switch.

W. Curtis Preston:

I can't put the router.

W. Curtis Preston:

That's gonna be my firewall on the same switch

W. Curtis Preston:

As my home LAN

Prasanna Malaiyandi:

Yeah,

W. Curtis Preston:

I dunno what I was thinking.

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

Just another topic that I know just a little bit about.

W. Curtis Preston:

I'm a little, I feel a little ashamed that that was.

W. Curtis Preston:

But I'm glad I talked to you about my, you know, as, as is

W. Curtis Preston:

the case with many subjects.

W. Curtis Preston:

I'm glad I talked to you about, you know, what I'm up to.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

Glad I could help.

W. Curtis Preston:

I have successfully purchased and configured for the

W. Curtis Preston:

video for the video watchers.

W. Curtis Preston:

Let's see if it makes it into the camera before the cable runs.

W. Curtis Preston:

There it is, the ASUS AX6600, which is a mesh router.

W. Curtis Preston:

And I gotta say it's much more better than what I had before,

W. Curtis Preston:

and it's able, I've got two.

W. Curtis Preston:

It's supposed to provide 5,500 square feet, but of course that's, that doesn't

W. Curtis Preston:

include drywall and two by fours, right?

Prasanna Malaiyandi:

it's crazy how much signal degrades going through drywall.

Prasanna Malaiyandi:

And the other thing people don't realize is five gigahertz,

Prasanna Malaiyandi:

like degrades like no tomorrow

W. Curtis Preston:

Right.

W. Curtis Preston:

Remind me, remind me why five gertz is better again.

Prasanna Malaiyandi:

It's faster because it can handle more bandwidth, and also

Prasanna Malaiyandi:

the channel is wider, so you can have more things talking at the same time.

Prasanna Malaiyandi:

It's just as your frequency goes up, the distance goes

Prasanna Malaiyandi:

down for the same power levels,

W. Curtis Preston:

So is this like DC versus ac?

Prasanna Malaiyandi:

Speaker:

not quite DC versus ac.

Prasanna Malaiyandi:

Speaker:

It's more about.

Prasanna Malaiyandi:

Speaker:

You need to pump as many things as possible into, because high frequency,

Prasanna Malaiyandi:

Speaker:

right, it's more per cycle, right, than 2.4, which is less airtime, if you will.

W. Curtis Preston:

Right.

Prasanna Malaiyandi:

And so every sort of peak, you can send

Prasanna Malaiyandi:

more out with the five gigahertz because you're doing it more often.

W. Curtis Preston:

right.

Prasanna Malaiyandi:

And so it works a lot better.

Prasanna Malaiyandi:

It's just the distance isn't as great.

Prasanna Malaiyandi:

Now, I will tell people, so this is one of my, I'm gonna

Prasanna Malaiyandi:

get up on my soapbox now, right?

Prasanna Malaiyandi:

One of my rare soapbox events and tell people, a lot of times people

Prasanna Malaiyandi:

think they need more wifi access points in their house to get coverage.

Prasanna Malaiyandi:

And to those people, I will say, plan out your network carefully.

Prasanna Malaiyandi:

Put your devices where they matter.

Prasanna Malaiyandi:

And also don't put too many devices and don't crank up the power all the way

Prasanna Malaiyandi:

to high, because I know Curtis, you and I were talking about this when you're

Prasanna Malaiyandi:

looking at mesh, and it was like, imagine that your router can overpower your

Prasanna Malaiyandi:

phone, your laptop, your iPad, so it's screaming at the top of its lungs and your

Prasanna Malaiyandi:

phone can barely even scream back at it.

Prasanna Malaiyandi:

And so that's actually worse for your network and for airtime than

Prasanna Malaiyandi:

actually sort of balancing out power.

W. Curtis Preston:

I just don't know if, like, the stuff

W. Curtis Preston:

you're talking about, like is.

W. Curtis Preston:

is that even, is that configuration option even on consumer class routers?

Prasanna Malaiyandi:

you'll have sort of the low, medium, high power

Prasanna Malaiyandi:

levels, uh, but it takes time to fine tune and tweet these, right?

Prasanna Malaiyandi:

You have to walk around with a wifi analyzer on your phone, right?

Prasanna Malaiyandi:

So Apple with their, uh, iPhones, right?

Prasanna Malaiyandi:

They ship, what is it?

Prasanna Malaiyandi:

Airport utility, which has a wifi scan.

Prasanna Malaiyandi:

Option, which will show you all the wifi networks and sort of the signal

Prasanna Malaiyandi:

strength, and you basically have to walk around your house with that and

Prasanna Malaiyandi:

be like, okay, where is it strong?

Prasanna Malaiyandi:

Where is it weak?

Prasanna Malaiyandi:

Right, to figure out the placement.

Prasanna Malaiyandi:

That's the ideal way, because what you want is you want coverage in the right

Prasanna Malaiyandi:

places, because what you see is in a lot of high density housing areas, or

Prasanna Malaiyandi:

even homes next to each other is most people end up with crummy wifi because

Prasanna Malaiyandi:

their power is turned up so high, it bleeds into everyone else's area

Prasanna Malaiyandi:

such that everyone has a crappy time.

Prasanna Malaiyandi:

because then you get interference and then everyone sort of slows down and then it

W. Curtis Preston:

Right.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

I got a lot of wifi.

W. Curtis Preston:

I got a lot of networks.

W. Curtis Preston:

Um, you know, um, yeah,

Prasanna Malaiyandi:

And for, and for the last bit, last bit of my

Prasanna Malaiyandi:

soapbox is please, please, please do not use 40 megahertz channel widths

Prasanna Malaiyandi:

on your 2.4 gigahertz channels.

Prasanna Malaiyandi:

You do not need to use 40 megahertz and ruin everyone else's connectivity.

Prasanna Malaiyandi:

Please only use 20 megahertz bands for 2.4 gigahertz.

W. Curtis Preston:

Uh, I'll see what I can do but I, but I have this

W. Curtis Preston:

new, you know, and again, I am not a wireless, I feel like a wireless nbe,

W. Curtis Preston:

but I have this new fancy right where it automatically selects the right.

W. Curtis Preston:

Um, that's

W. Curtis Preston:

pretty cool.

Prasanna Malaiyandi:

Point to go.

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Well, not just that, but also

W. Curtis Preston:

2.4 versus five.

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

So actually all of this is part of the wifi standard,

Prasanna Malaiyandi:

so the figuring out which access point, that's part of the 8 0 2 11 R standard.

Prasanna Malaiyandi:

And I think that the band steering is also part of the standard as well.

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

a lot of folks are implementing now.

Prasanna Malaiyandi:

Some devices don't do well with band steering.

Prasanna Malaiyandi:

It basically looks at sort of the difference between the five gigahertz

Prasanna Malaiyandi:

and the 2.4 gigahertz and says, okay, which one should I pick?

Prasanna Malaiyandi:

And most devices, if it's seven decibels difference or more, then it'll pick,

Prasanna Malaiyandi:

uh, the higher the faster speed.

Prasanna Malaiyandi:

And so that's kind of how it tricks your devices into picking the right band.

W. Curtis Preston:

Interesting.

W. Curtis Preston:

Yeah, it's kind of cool.

W. Curtis Preston:

Um, all I know is that I finally have a mesh that covers the two.

W. Curtis Preston:

Cuz my problem is that I have things in the garage, things embedded inside

W. Curtis Preston:

walls in the garage that need wifi, not just, not just inside walls.

W. Curtis Preston:

, I have a device that's inside a wall, inside an electrical

W. Curtis Preston:

cabinet, inside a wall.

W. Curtis Preston:

Right?

W. Curtis Preston:

I have a sense, uh, app or a bi, a device, and that's deep inside my

W. Curtis Preston:

electrical, my circuit breaker box.

W. Curtis Preston:

Um, and this reached to it.

W. Curtis Preston:

No problem.

W. Curtis Preston:

It didn't, it it had, it had like two bars.

W. Curtis Preston:

Right.

W. Curtis Preston:

So clearly, and, and the thing is, it's only, it's like 20 feet from.

Prasanna Malaiyandi:

yep.

W. Curtis Preston:

Right.

W. Curtis Preston:

But it's, you know, a couple of drywall walls and some

W. Curtis Preston:

two by fours and some metal.

W. Curtis Preston:

Uh, but it worked.

W. Curtis Preston:

That's the important part is that it worked.

W. Curtis Preston:

Um, yeah, so I th I think I might be in, I think I might

W. Curtis Preston:

be in wifi heaven for a while.

W. Curtis Preston:

Um, and you too can be there for the low, low price of $350 That's

W. Curtis Preston:

a two, that's a two node system.

W. Curtis Preston:

Um, And it's supposed like, yeah, but I'm pretty happy.

W. Curtis Preston:

But, uh, that's not what we're talking about today.

Prasanna Malaiyandi:

really.

Prasanna Malaiyandi:

We can talk about wifi all day if you want.

W. Curtis Preston:

yeah.

W. Curtis Preston:

Well, you could talk about wifi all day.

W. Curtis Preston:

I feel really stupid when you're talking about wifi, because I'm

W. Curtis Preston:

like, this is not my Bailey Wick.

W. Curtis Preston:

That's a cool word, by the way, Bailey Wick.

W. Curtis Preston:

So I thought we'd talk about backups instead because that's, that's my world.

W. Curtis Preston:

And I feel comfortable knowing them.

W. Curtis Preston:

Most people don't know crap about this space, uh, because they, they, you know,

W. Curtis Preston:

they get the job as a junior person and then next thing you know, they become a,

W. Curtis Preston:

a real sys admin or a network admin or a, you know, or a security admin or a dba.

Prasanna Malaiyandi:

Yeah, well, except our listeners who are all

Prasanna Malaiyandi:

awesome and probably experts in the backup field and know all about this.

W. Curtis Preston:

Well, certainly Daniel.

Prasanna Malaiyandi:

Hi Daniel.

W. Curtis Preston:

Hi Daniel.

W. Curtis Preston:

The backup anorak.

W. Curtis Preston:

Um, I wonder, you know, he's never, he's never, he better still be

W. Curtis Preston:

listening to the show since we call out to him every once in a while.

W. Curtis Preston:

Him and Stuart, although Stuart's retired.

W. Curtis Preston:

I don't think Stuart's listening to our show.

W. Curtis Preston:

I only tell 'em when we talk about 'em.

W. Curtis Preston:

But, um, so we're continuing in our backup to basic series.

W. Curtis Preston:

It's been a couple of weeks, uh, as the kids say it's been a minute,

W. Curtis Preston:

uh, since such a, I remember the first time I heard that thing, I was

W. Curtis Preston:

like, what are you talking a minute?

W. Curtis Preston:

Anyway, . But yeah, it's been a minute since we've done an episode of our

W. Curtis Preston:

Backup to Basic series, but I am looking down at the book and of course,

W. Curtis Preston:

uh, for those of you that don't know, basically we're doing a podcast version

W. Curtis Preston:

of my book, modern Data Protection.

W. Curtis Preston:

Make sure it gets in camera here from O'Reilly.

W. Curtis Preston:

Uh, you can purchase the, uh, the, the print version from,

W. Curtis Preston:

uh, your favorite book seller.

W. Curtis Preston:

Um, , perhaps it's one based in the Amazon, perhaps not, uh, Um, and,

W. Curtis Preston:

uh, but if you would like an ebook version of it, you can get your

W. Curtis Preston:

own by going to druva.com/ebook.

W. Curtis Preston:

That's d r uva.com/ebook.

W. Curtis Preston:

They will, of course, ask for your contact information and then email the crap

W. Curtis Preston:

out of you until you tell 'em to stop.

W. Curtis Preston:

But, That is, that is the price that you pay.

W. Curtis Preston:

Um, let's talk

W. Curtis Preston:

about, oh yeah.

W. Curtis Preston:

And while we're at it, uh, I'll throw out the disclaimer, uh, that

W. Curtis Preston:

this is an independent podcast and, um, uh, I work for Druva,

W. Curtis Preston:

Prasanna works for Zoom and, um,

W. Curtis Preston:

The, um, but the opinions that you hear are ours.

W. Curtis Preston:

Um, and.

W. Curtis Preston:

Et cetera.

W. Curtis Preston:

Please rate us, uh, by going to your, you know, most of you're on iTunes.

W. Curtis Preston:

Just scroll down to the bottom there, give us five or six stars and a comment.

W. Curtis Preston:

We love comments.

W. Curtis Preston:

And, uh, if you'd like to join the conversation, just contact me, w Curtis

W. Curtis Preston:

Preston gmail or WC Preston on Twitter.

Prasanna Malaiyandi:

What about LinkedIn?

W. Curtis Preston:

But n oh yeah, LinkedIn.

W. Curtis Preston:

Uh, it's linkedin.com/what is it?

W. Curtis Preston:

Slash in slash Mr.

W. Curtis Preston:

Beck.

W. Curtis Preston:

Um, and by the way, my Twitter account already has multifactor

W. Curtis Preston:

authentication, configured not using sms, which as should you, especially

W. Curtis Preston:

now that they're disabling, that so weird the way they did that.

W. Curtis Preston:

What's funny is I support the desysion.

W. Curtis Preston:

That's just the way

W. Curtis Preston:

they

Prasanna Malaiyandi:

way it came out.

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

Oh, Elon.

W. Curtis Preston:

Okay.

W. Curtis Preston:

So in our backup to basic series, we're continuing on, and today we are talking

W. Curtis Preston:

about using disk and deduplication.

W. Curtis Preston:

You know, I, I, um, couple weeks ago, I hit 30 years in the backup industry,

W. Curtis Preston:

and I got interviewed by Chris Mellor

Prasanna Malaiyandi:

the register and blocks and files.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

It's in his, for his block and file.

W. Curtis Preston:

Um, um, blog and one of the questions was what I thought was the most,

W. Curtis Preston:

um, important development in the backup industry since I joined.

W. Curtis Preston:

And to me, hands down, it's not even, it's not, there's not even a close second, and

W. Curtis Preston:

that is the invention of deduplication

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

and because.

W. Curtis Preston:

I, I can't think of another technology in the backup space that has changed backup

W. Curtis Preston:

architecture more than deduplication, and I can think of many other things

W. Curtis Preston:

that we do that are only possible because deduplication is underneath them,

Prasanna Malaiyandi:

Oh yeah, definitely.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

I don't think we would be able to get, especially with the data growth

Prasanna Malaiyandi:

and the size of these applications.

W. Curtis Preston:

Is data growing?

W. Curtis Preston:

Is

Prasanna Malaiyandi:

No, not at all.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

I don't think it would be possible to do, like I know Curtis, you've talked

Prasanna Malaiyandi:

about previous, like in your early days, right, about trying to do a backup.

Prasanna Malaiyandi:

I being like, oh my God, how am I gonna do this full backup in a weekend?

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

And just with the fact, and I know we'll go and talk

Prasanna Malaiyandi:

about more about deduplication, but yeah, just being able to now do that

Prasanna Malaiyandi:

in a cost effective way, using new ways of actually doing the backups as well,

Prasanna Malaiyandi:

which is enabled with deduplication.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

So it, it's, it's like disk.

W. Curtis Preston:

You could argue that disk using disk and backups is the bigger, uh, advancement.

W. Curtis Preston:

But first off, not really an advancement.

W. Curtis Preston:

It's just instead of tape, we're gonna use disc,

W. Curtis Preston:

but

Prasanna Malaiyandi:

was there to start with anyway.

Prasanna Malaiyandi:

It was just sort of, the cost was so high, and especially given the type of workload

Prasanna Malaiyandi:

you see with deduplication where, or with backups where you're doing periodic

Prasanna Malaiyandi:

fulls or other things like that, and keeping them for long periods of time.

Prasanna Malaiyandi:

Are you going to spend what, 40 x or 30 x on storage for your backup

Prasanna Malaiyandi:

system versus your production?

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

That's a hard sell.

W. Curtis Preston:

just, yeah, cuz that's a problem.

W. Curtis Preston:

So one of the, one of the things, uh, that I remember from back in the

W. Curtis Preston:

day, like I, I don't remember really thinking about this lately, but back

W. Curtis Preston:

in the day, I would say that for every gigabyte of primary storage, you

W. Curtis Preston:

had 20 gigabytes of backup storage.

W. Curtis Preston:

And so if you're gonna do that with disk, even, you know, even once, many years ago.

W. Curtis Preston:

Wow.

W. Curtis Preston:

At this point, it's like 20 years ago, . But, but even once they came out with

W. Curtis Preston:

this idea of, Uh, SATA disk instead

W. Curtis Preston:

of

Prasanna Malaiyandi:

nearline

Prasanna Malaiyandi:

storage.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, that, that helped bring the cost down significantly.

W. Curtis Preston:

But, But,

W. Curtis Preston:

not, But not as much as deduplication.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

Because even with those price differences, right?

Prasanna Malaiyandi:

Maybe it was half the price or a third of the price, but once you add

Prasanna Malaiyandi:

in that 20 x that you talked about, right, Curtis, then that adds up.

Prasanna Malaiyandi:

And it's not only just the storage cost, it's also you have

Prasanna Malaiyandi:

to account for the power, the cooling, the floor space, right?

Prasanna Malaiyandi:

All the things that go into that system.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Um, it's funny, um, just sort of, just sort of a, an afterthought that, that.

W. Curtis Preston:

Post that, um, that Chris Mellor did about the 30 years.

W. Curtis Preston:

The one group that jumped on the article and just started retweeting all kinds of

W. Curtis Preston:

parts of, of, or pieces of the article was the tape group , because I said,

W. Curtis Preston:

I said really good things about tape.

W. Curtis Preston:

And, and the thing is that, um, you know, I, I, you know, I, I

W. Curtis Preston:

believe in all of those things, but.

W. Curtis Preston:

You know, all of the advancements that I've seen in backup in the last 20 plus

W. Curtis Preston:

years has been disk and deduplication.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, so let's talk about, so what, so not everybody really

W. Curtis Preston:

understands what deduplication is.

W. Curtis Preston:

Some people used to describe it like, well, it's like compression, uh, the way

W. Curtis Preston:

I remember it's like macro compression.

W. Curtis Preston:

Um, it's like compression over time.

W. Curtis Preston:

do you think of that?

Prasanna Malaiyandi:

uh, I don't quite like that, so, so, right.

W. Curtis Preston:

may be some old blog posts that I might have

W. Curtis Preston:

said that phrase, but go ahead.

Prasanna Malaiyandi:

so in my mind, right, deduplication is.

Prasanna Malaiyandi:

Finding two identical segments and tossing one away, keeping only one copy,

Prasanna Malaiyandi:

but still keeping a reference to that so you can, so you still know you have two

Prasanna Malaiyandi:

virtual copies, but one physical copy,

W. Curtis Preston:

Mm-hmm.

Prasanna Malaiyandi:

right?

Prasanna Malaiyandi:

At a high level, that's what I, and now

W. Curtis Preston:

you?

Prasanna Malaiyandi:

what is compression is taking an object, a singular object,

Prasanna Malaiyandi:

and squeezing it into a smaller space.

W. Curtis Preston:

Right.

W. Curtis Preston:

But how do you understand how compression works?

W. Curtis Preston:

Cuz I Sure as hell don't

Prasanna Malaiyandi:

yeah, so typically like you would run it

Prasanna Malaiyandi:

through different types of algorithms like LZ compression and all the rest

Prasanna Malaiyandi:

in order to look for patterns and throw away bits and compress it down.

Prasanna Malaiyandi:

Now, the difference I would say between duping compression

Prasanna Malaiyandi:

because they do sound the same,

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

right?

Prasanna Malaiyandi:

I would say one of the differences is with deduplication.

Prasanna Malaiyandi:

It's more like a file system level compression, if you want to

Prasanna Malaiyandi:

think of it that way, because it's not just I'm taking this block.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

It's not just I'm taking this and I'm squeezing it down such

Prasanna Malaiyandi:

that it could be, I just need to look at this and figure it out.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

It's a lot more complex than that.

W. Curtis Preston:

It is definitely a lot more complex than compression.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, I, I, I've just, I've, I've just honestly never really dug into the code

W. Curtis Preston:

of how traditional compression works.

W. Curtis Preston:

So the idea is that I'm looking for duplicate segments of data across many

W. Curtis Preston:

places, both from different sources as well as different time periods, right?

W. Curtis Preston:

I'm, I'm comparing the, this chunk of data that's coming in right

W. Curtis Preston:

now and, and tonight's backup.

W. Curtis Preston:

I'm comparing it literally with every chunk of data that I've ever received

W. Curtis Preston:

from anywhere else.

W. Curtis Preston:

. Prasanna Malaiyandi: I would say that's

W. Curtis Preston:

builds their deduplication that way.

W. Curtis Preston:

, W. Curtis Preston: So,

Prasanna Malaiyandi:

where

Prasanna Malaiyandi:

it's

W. Curtis Preston:

there are, yeah, go ahead.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

So it all goes down to sort of what is your deduplication domain is another

Prasanna Malaiyandi:

term that some people talk about, right?

Prasanna Malaiyandi:

Which is, is it limited to a system?

Prasanna Malaiyandi:

Is it limited to a cluster which might be formed to multiple systems,

Prasanna Malaiyandi:

or is it limited to sort of a single backup stream coming in?

Prasanna Malaiyandi:

So

Prasanna Malaiyandi:

there.

W. Curtis Preston:

that the question is what is your data domain?

W. Curtis Preston:

Uh,

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

D Domain.

W. Curtis Preston:

So let's back up.

W. Curtis Preston:

So a, as I understand it, right, so basically we're taking the data

W. Curtis Preston:

that's, that's coming in or that's going to come in, we're slicing

W. Curtis Preston:

it up into, I like the term chunk.

W. Curtis Preston:

, right?

W. Curtis Preston:

We run those chunks through a cryptographic hashing algorithm.

W. Curtis Preston:

SH one, Shaw 2 56, whatever it, whatever you're using.

W. Curtis Preston:

On the other side of that, we get a alpha numeric value, in the case of SH one,

W. Curtis Preston:

it's 160 bit alpha alphanumeric value.

W. Curtis Preston:

so basically you, you, depending on the algorithm you use, you get a, um, you get

W. Curtis Preston:

an alpha numeric value at the end, and the size of that val, of that value is going

W. Curtis Preston:

to be based on which algorithm you use.

W. Curtis Preston:

In the case of SHA-1, it's 160 bits, right?

W. Curtis Preston:

And.

W. Curtis Preston:

You can then take the 160 bits.

W. Curtis Preston:

You can't reverse engineer it.

W. Curtis Preston:

You can't take the 160 bits and turn it into the chunk, but you can use that, that

W. Curtis Preston:

value to uniquely identify that chunk.

W. Curtis Preston:

And so if you have another chunk of data, regardless of where it came from,

W. Curtis Preston:

If it's 160 bit value, again, that's SHA-1 and other values are different.

W. Curtis Preston:

If it's fingerprint is the same, you can say that this chunk is identical

W. Curtis Preston:

to that other chunk that had the same fingerprint, and you can then

W. Curtis Preston:

discard the other chunk, right?

W. Curtis Preston:

the,

W. Curtis Preston:

the

Prasanna Malaiyandi:

Yeah, you can, you can discard the actual data,

Prasanna Malaiyandi:

but you should still keep track of it somewhere in a file system,

Prasanna Malaiyandi:

just because you need, still need

W. Curtis Preston:

Yeah.

W. Curtis Preston:

You're gonna keep track.

W. Curtis Preston:

Oh, we found another one of these,

W. Curtis Preston:

right?

Prasanna Malaiyandi:

And so usually that lookup is in a deduplication

Prasanna Malaiyandi:

index is what they called them.

Prasanna Malaiyandi:

Usually a dedupe index, which keeps a list of, Hey, here are

Prasanna Malaiyandi:

all the fingerprints that I have.

W. Curtis Preston:

Right.

W. Curtis Preston:

As we, we were alluding to before, one of the things that determines

W. Curtis Preston:

sort of your effectiveness of, of dedupe is the dedupe domain, right?

W. Curtis Preston:

So I've seen it file system level, meaning it only looks for

W. Curtis Preston:

duplicate data within each volume.

W. Curtis Preston:

I've seen it host level, I've seen it backup level, meaning

W. Curtis Preston:

literally backup configuration wise.

W. Curtis Preston:

right?

W. Curtis Preston:

So if I, if I have a Windows server and I'm backing up the host and I'm

W. Curtis Preston:

backing up SQL Server, I only look for duplicates within SQL Server

W. Curtis Preston:

backups right against each other.

W. Curtis Preston:

Uh, then we have, um, if we're backing up several systems to a box, right?

W. Curtis Preston:

Maybe that the dedupe domain is only within that box.

W. Curtis Preston:

It's only looking for.

W. Curtis Preston:

Duplicates between all of that.

W. Curtis Preston:

And then there's what I would call truly global dedupe, which is , we're looking

W. Curtis Preston:

for duplicates from everything that's coming in, uh, from multiple sources.

W. Curtis Preston:

Right?

Prasanna Malaiyandi:

Mm-hmm.

W. Curtis Preston:

there is a.

W. Curtis Preston:

Point of decreasing marginal returns, right?

W. Curtis Preston:

You can argue, and certainly if you're a company that only does d dedupe within,

W. Curtis Preston:

like earlier I was, we only looked for dupes within SQL server backups.

W. Curtis Preston:

You could make an argument that, well, there's not a lot of duplicate data

W. Curtis Preston:

between SQL Server and Windows, right?

W. Curtis Preston:

so even though we're not comparing the two, there's not, there's not gonna

W. Curtis Preston:

be a lot of duplicate data there, and there's not gonna be a lot of duplicate.

W. Curtis Preston:

between the SQL Server database on this host and the SQL

W. Curtis Preston:

Server database on that host.

W. Curtis Preston:

So that's another argument that some

Prasanna Malaiyandi:

but, but I think a lot of that was because of

Prasanna Malaiyandi:

architectural limitations of the products themselves rather than,

Prasanna Malaiyandi:

that is really what you wanted to do.

Prasanna Malaiyandi:

Right?

Prasanna Malaiyandi:

Because

Prasanna Malaiyandi:

that's more of a management issue.

W. Curtis Preston:

they didn't, It was like, it was like, well, if we're

W. Curtis Preston:

gonna do it, if we're gonna do it that way, it's gonna be much harder.

W. Curtis Preston:

to, to, to design a product to do it that way.

W. Curtis Preston:

And we don't think, we don't think that there's going to

W. Curtis Preston:

be that much more benefit, um,

Prasanna Malaiyandi:

But on the other hand, if you look

Prasanna Malaiyandi:

at things like VMware, right?

Prasanna Malaiyandi:

If I have a bunch of VMs, right, there's a good cha, and they all came

Prasanna Malaiyandi:

from a single golden image, right?

Prasanna Malaiyandi:

There's a good chance that as you're backing it up, 80, 90% of that stuff

Prasanna Malaiyandi:

is all gonna be deduplicated, right?

W. Curtis Preston:

Absolutely.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

There's also a lot of duplicate data even within like a large filer, right?

W. Curtis Preston:

There's gonna be lots of duplicate data there, right?

W. Curtis Preston:

So if you're only doing it volume to volume or backup configuration to

W. Curtis Preston:

backup configuration, you, there's a lot of duplicate data that I

W. Curtis Preston:

think you would, you would miss.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

I know you talked about the domains, but I think another

Prasanna Malaiyandi:

thing to also mention is, Some products do different types of chunking, if you will.

Prasanna Malaiyandi:

Some do it at the file level, others do it at sort of a smaller level, right?

Prasanna Malaiyandi:

And some do sort of fixed segment where each one is sort of a fixed length.

Prasanna Malaiyandi:

Others do sort of variable segments where they try to figure out what is optimal,

Prasanna Malaiyandi:

because depending on how you're doing your fingerprinting, right, you want to

Prasanna Malaiyandi:

find the most number of matches, right?

Prasanna Malaiyandi:

So you can save on storage.

W. Curtis Preston:

right.

W. Curtis Preston:

I,

Prasanna Malaiyandi:

another thing that also comes up.

W. Curtis Preston:

I would argue that file level dedupe isn't really

W. Curtis Preston:

dedupe, it's more a single instance.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, that's like single instance storage of a file, you

W. Curtis Preston:

know?

W. Curtis Preston:

Okay.

W. Curtis Preston:

It, it's, yeah.

W. Curtis Preston:

But so I, I'm always thinking subfile, uh, when I think about what I think

W. Curtis Preston:

of actual dedupe . There is a much, like a very big, uh, other way that

W. Curtis Preston:

we divide up the dedupe industry, and that is source versus target.

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

Um, the, um, the first dedupe product I ever saw,

W. Curtis Preston:

which was, uh, no, was not, that was not the first, no, the first one I saw

W. Curtis Preston:

the product at the time was called Undo.

W. Curtis Preston:

Have we talked

W. Curtis Preston:

about this?

Prasanna Malaiyandi:

Mm.

W. Curtis Preston:

Undo with two Os.

W. Curtis Preston:

It was really funny that the name of a dedupe vendor.

W. Curtis Preston:

Had duplicate data in their company name.

W. Curtis Preston:

It was undoo with two os.

W. Curtis Preston:

You know this product, you just don't know that that's what it used to be called.

Prasanna Malaiyandi:

What is it?

Prasanna Malaiyandi:

What

W. Curtis Preston:

give you a, I'll give, I'll give you a hint.

W. Curtis Preston:

It.

W. Curtis Preston:

The name comes from the fact that it would be a C of availability.

W. Curtis Preston:

I'm gonna, I'm gonna put the, the Jeopardy theme in here.

Prasanna Malaiyandi:

What would it see of availability?

W. Curtis Preston:

That's what the name, that's where the name for the company

W. Curtis Preston:

comes from, or if I want to put it in the right order, an availability c.

Prasanna Malaiyandi:

I don't know what this is.

W. Curtis Preston:

Avamar

Prasanna Malaiyandi:

Oh, oh, that makes sense.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

So that's, that's where the name Avamar came from.

W. Curtis Preston:

So the, the first

Prasanna Malaiyandi:

I should know that

W. Curtis Preston:

you shouldn't know

Prasanna Malaiyandi:

I having being, uh, part of my former employer.

Prasanna Malaiyandi:

Yes.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Well, I mean, you know, I, I have a bit of an inside track because that they're, They

W. Curtis Preston:

were right up the road from me, right?

W. Curtis Preston:

They were up there.

W. Curtis Preston:

They were up in Irvine.

W. Curtis Preston:

Um, and that was, uh, the first dedupe product.

W. Curtis Preston:

They were a source dedupe . So what's the difference between source

W. Curtis Preston:

dedupe and target dedupe Prasanna?

Prasanna Malaiyandi:

So the biggest one is, so let's first

Prasanna Malaiyandi:

talk about target tup, right?

Prasanna Malaiyandi:

So Target Tup is data comes into the system and then a deduplication

Prasanna Malaiyandi:

algorithm runs tosses away data.

Prasanna Malaiyandi:

It can support any type of client as long as it supports

Prasanna Malaiyandi:

whatever the protocol it has.

Prasanna Malaiyandi:

So it's NFS or smb, right?

Prasanna Malaiyandi:

Whatever can write to it, the data gets deduped.

W. Curtis Preston:

Hang on, hang on.

W. Curtis Preston:

Before you go on to that.

W. Curtis Preston:

I don't disagree with what you said.

W. Curtis Preston:

I just, I think there could be a little bit more clarification.

W. Curtis Preston:

It's a box that I send whatever I want to.

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

Typically it, the thing about Target Dedup was that,

W. Curtis Preston:

um, that it was, you didn't have to do a lot of re-engineering of

W. Curtis Preston:

your backup system.

Prasanna Malaiyandi:

it's like a VTL system, right?

Prasanna Malaiyandi:

That came.

W. Curtis Preston:

plug in a box.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

And you would send you, and basically you stopped using tape and you

W. Curtis Preston:

sent your backups to this box.

W. Curtis Preston:

Maybe the box might even be pretending to be a tape library,

W. Curtis Preston:

the virtual tape library.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, and then it did all the dedupe magic over there.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

Which was great because you can

Prasanna Malaiyandi:

just plug in your box and go.

Prasanna Malaiyandi:

Now the other side is called source side dedupe, instead of sending all the

Prasanna Malaiyandi:

data and tossing it away, why don't we do something smart and actually figure

Prasanna Malaiyandi:

out the duplicates on the client itself, on the source right, dedupe on the

Prasanna Malaiyandi:

source, and only send the unique data.

Prasanna Malaiyandi:

And this has the advantage.

Prasanna Malaiyandi:

Actually not sending the data over the wire, which is actually

Prasanna Malaiyandi:

a huge benefit that people don't understand always, right?

Prasanna Malaiyandi:

Is not sending the data can actually make it a lot faster, even though

Prasanna Malaiyandi:

you think, oh, I'm now putting additional load on my server itself.

Prasanna Malaiyandi:

But it ends up being better than trying to send all the data and just tossing

Prasanna Malaiyandi:

it away like target-side dedupe does.

W. Curtis Preston:

I would say it theoretically should be better

W. Curtis Preston:

right?

W. Curtis Preston:

Because you, I'm just saying I've seen some crappy source dedupe systems, right?

Prasanna Malaiyandi:

Okay.

Prasanna Malaiyandi:

Sorry.

Prasanna Malaiyandi:

I've seen some, I've seen some good ones, or the ones that I've

Prasanna Malaiyandi:

interacted with have been good.

Prasanna Malaiyandi:

And so I've seen the performance numbers around

W. Curtis Preston:

Yeah.

W. Curtis Preston:

I, I do think it, it makes more sense to me.

W. Curtis Preston:

It always made more sense to me.

W. Curtis Preston:

The only reason why we had Target dedupe was because to do source dedupe , you

W. Curtis Preston:

have to redesign the backup product.

W. Curtis Preston:

, right?

W. Curtis Preston:

It took a long time to get, to get, uh, basically you have to

W. Curtis Preston:

stop using net backup networker or tsm, whatever it was back in the

W. Curtis Preston:

day, and you had to replace it.

W. Curtis Preston:

Like in this case with Avamar, Avamar was a source do-do product.

W. Curtis Preston:

You had to do what we call a four clipped upgrade.

W. Curtis Preston:

You had to throw out the baby with the bathwater, whatever phrase, whatever.

W. Curtis Preston:

You know,

W. Curtis Preston:

uh, analogy you want to use there.

W. Curtis Preston:

That was the main problem as I saw it with source dedup.

W. Curtis Preston:

Right.

W. Curtis Preston:

Is that, is that you, you had to change your backup product to get it,

Prasanna Malaiyandi:

and that was in the beginning, right?

Prasanna Malaiyandi:

At the very early

W. Curtis Preston:

Well, well, You.

W. Curtis Preston:

You, well, yeah.

W. Curtis Preston:

Now you just had to, had to upgrade your backup product, right?

W. Curtis Preston:

Because many of modern backup technologies now support source dedupe , although

W. Curtis Preston:

even some newer backup technologies don't, I don't know if, I dunno if that

W. Curtis Preston:

came out in English, so some I, there was some double negatives in there.

W. Curtis Preston:

Some newer, very new backup technologies.

W. Curtis Preston:

Don't do source dedupe

W. Curtis Preston:

. Prasanna Malaiyandi: which seems bunkers.

W. Curtis Preston:

Speaker:

which does seem bonkers.

W. Curtis Preston:

Speaker:

Um, I, you know, and, um, I'm talking about the likes of

W. Curtis Preston:

Speaker:

Rubric and Cohesity, right?

W. Curtis Preston:

Speaker:

These are new, these are, you know, next gen backup products that were designed

W. Curtis Preston:

Speaker:

in the last, less than the last 10 years.

W. Curtis Preston:

Speaker:

Right.

W. Curtis Preston:

Speaker:

And it's based on an appliance model.

W. Curtis Preston:

Speaker:

and they do all the dedupe inside that box, is my understanding, right?

Prasanna Malaiyandi:

And I just wanna challenge that, Curtis, because I thought

Prasanna Malaiyandi:

in some cases, They do do source side deduplication, but I think because they've

Prasanna Malaiyandi:

tried to be open and act as a target device, in those cases, you can't, like,

Prasanna Malaiyandi:

you don't really have another option.

W. Curtis Preston:

Yeah, I, I don't, well, again, I'm not,

Prasanna Malaiyandi:

I, but I don't know

W. Curtis Preston:

work at, I work at Druva, not at Rubrik, uh, or, or Cohesity.

W. Curtis Preston:

But it is my understanding that they do target side dedup, which is, and,

W. Curtis Preston:

and one of the challenges of target side dedup is you need an appliance.

W. Curtis Preston:

at each location.

W. Curtis Preston:

Now I know that they can do virtual appliances, right?

W. Curtis Preston:

So they have a, they have a VM level appliance.

W. Curtis Preston:

Uh, but you need a box or something pretending to be a box at each location,

W. Curtis Preston:

because if you're not eliminating the duplicates before you send it

W. Curtis Preston:

to the box, um, then you need, you need something that's on-prem, right?

Prasanna Malaiyandi:

Because you definitely don't wanna

Prasanna Malaiyandi:

send that all over the Wan

W. Curtis Preston:

No, no, that's the, to me, that's the biggest advantage

W. Curtis Preston:

of a source dedupe system is that it's ultimately scalable, right?

W. Curtis Preston:

That you, that assuming, assuming it doesn't slow things down, assuming,

W. Curtis Preston:

assuming all these things, assuming that the product actually works, um, that

W. Curtis Preston:

you, um, you could back up a laptop.

W. Curtis Preston:

, right?

W. Curtis Preston:

You can back up a mobile phone and the, the duplicate data will be eliminated

W. Curtis Preston:

before it's sent over the wan, which is what you need to do if you're

W. Curtis Preston:

backing up something over the internet.

Prasanna Malaiyandi:

Mm-hmm.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um, and, um, so the, the downside that some, you know, again, you,

W. Curtis Preston:

you, you talked about it already, is that it does put additional

W. Curtis Preston:

compute requirement on the client.

W. Curtis Preston:

The argument is that it's offset by the,

W. Curtis Preston:

um, the savings of the network bandwidth.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

There is also one more downside,

W. Curtis Preston:

okay.

Prasanna Malaiyandi:

which is that.

Prasanna Malaiyandi:

Not all applications can do source side deduplication.

Prasanna Malaiyandi:

So if you do have an application which only supports writing to like

Prasanna Malaiyandi:

an NFS Mount point or an SMB Mount point, or something that doesn't

Prasanna Malaiyandi:

allow the integration of these source side deduplication duplication logic,

Prasanna Malaiyandi:

then you are going to need to be able to support target side dedupe.

Prasanna Malaiyandi:

do.

W. Curtis Preston:

Yep.

W. Curtis Preston:

Uh, agreed.

W. Curtis Preston:

Um, and an example of that would be like, um, uh, Oracle, right?

Prasanna Malaiyandi:

Yep.

Prasanna Malaiyandi:

Incremental merge.

W. Curtis Preston:

yeah.

W. Curtis Preston:

Um, although I would think that you should be able, I don't know, we could, we

Prasanna Malaiyandi:

No, you can't.

Prasanna Malaiyandi:

You can't.

Prasanna Malaiyandi:

You can't.

W. Curtis Preston:

You can't take the Oracle stream and slice it and dice it.

W. Curtis Preston:

I don't know.

Prasanna Malaiyandi:

Did you what?

Prasanna Malaiyandi:

Sorry?

Prasanna Malaiyandi:

You could, um, there are companies out there which give, which provide

Prasanna Malaiyandi:

a virtual file system interface

Prasanna Malaiyandi:

that lives

W. Curtis Preston:

So you you fake it.

W. Curtis Preston:

You fake it out.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Okay.

W. Curtis Preston:

All right.

W. Curtis Preston:

And then I've got something called hybrid dedupe and this, this was

W. Curtis Preston:

invented by your former employer.

Prasanna Malaiyandi:

I don't even know what a hybrid dedupe is.

W. Curtis Preston:

it's, it's, it's Target Dedoo pretending to be Source cdu.

W. Curtis Preston:

It's

Prasanna Malaiyandi:

D.

Prasanna Malaiyandi:

Oh, see, here's my, okay, so here's my problem is I think Boost

W. Curtis Preston:

Uhhuh.

Prasanna Malaiyandi:

is source.

Prasanna Malaiyandi:

I deduplication, I don't know if I would call it hybrid, because it is

Prasanna Malaiyandi:

very similar to what Avamar DI did.

Prasanna Malaiyandi:

, right?

Prasanna Malaiyandi:

It's moving the deduplication logic to the client

Prasanna Malaiyandi:

such that you could do all of the computation.

Prasanna Malaiyandi:

The same thing that we have talked about with source I deduplication,

W. Curtis Preston:

I, I'll tell you why I put it in a different category.

W. Curtis Preston:

To me, hybrid dedupe is redoing the backup software.

W. Curtis Preston:

I'm sorry, source dedupe, true source iDation.

W. Curtis Preston:

It's done at the backup software level,

Prasanna Malaiyandi:

Okay, then.

Prasanna Malaiyandi:

I

W. Curtis Preston:

with, with with hybrid dedupe . I'm still dumb sending

W. Curtis Preston:

everything to this source dedupe thing that's gonna redo it, right?

W. Curtis Preston:

Um, it doesn't matter in the end, you get, you get roughly the same benefits, right?

W. Curtis Preston:

Um, that's what, uh,

Prasanna Malaiyandi:

Okay.

Prasanna Malaiyandi:

So with hybrid, yeah.

Prasanna Malaiyandi:

You get the benefits of source without having to upgrade and, or sorry,

Prasanna Malaiyandi:

throw away your backup software.

W. Curtis Preston:

Right, right, right.

W. Curtis Preston:

Um, so I, I, um, we spent most of this time talking about dedupe . Um,

W. Curtis Preston:

there are a bunch of different ways to use disk in your backup system.

W. Curtis Preston:

Some of which don't really require dedup, right?

W. Curtis Preston:

We used to do what we call disk cashing, where you just had enough

W. Curtis Preston:

disk for last night's backup.

W. Curtis Preston:

You would back up to disk and then you would copy that to tape, and then

W. Curtis Preston:

you would hand that to a man in a van.

W. Curtis Preston:

Uh, then we got a bunch of different things.

W. Curtis Preston:

I got D to D to T D to D to D, D to D, D to C, and D to D to to C.

W. Curtis Preston:

Did I do all that?

W. Curtis Preston:

So dis to dis to tape disc, to disc to disk, direct cloud and

W. Curtis Preston:

dis to disc to cloud, right?

W. Curtis Preston:

So these are all ways that people use disk in current backup systems.

W. Curtis Preston:

Um, to me, d D to C or disto disc to cloud is really dis to disc.

W. Curtis Preston:

To disc is just the cloud is or the

W. Curtis Preston:

disc Is being run by the cloud, right?

W. Curtis Preston:

And I will say that dedupe , by the way, I will say that without d.

W. Curtis Preston:

The whole thing of using the cloud, the way we use the cloud just wouldn't work.

W. Curtis Preston:

I mean, you can't send full backups to the cloud.

W. Curtis Preston:

I mean, you could, with unlimited bandwidth.

Prasanna Malaiyandi:

well, and yeah, with unlimited bandwidth

Prasanna Malaiyandi:

it would just be expensive.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

Just going back to the conversation we had earlier about the wan, right?

Prasanna Malaiyandi:

You don't wanna send full copies out to over the wan.

W. Curtis Preston:

right.

Prasanna Malaiyandi:

Um, because that gets expensive and very slow.

Prasanna Malaiyandi:

Um, the other one I was going to comment on was, uh, oh, I know we've

Prasanna Malaiyandi:

been talking about disk, but I think it's also important to acknowledge

Prasanna Malaiyandi:

that now it's no longer spinning disk.

Prasanna Malaiyandi:

It could also be flash.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

We've seen

W. Curtis Preston:

yeah,

W. Curtis Preston:

but that's a whole other thing

Prasanna Malaiyandi:

I I, I, know, but I'm just saying that when it

Prasanna Malaiyandi:

comes to deduplication and backup ST or protection storage, right?

Prasanna Malaiyandi:

This, it could be flash, it could be disk, it could be object storage, right?

Prasanna Malaiyandi:

So I think it's important to differentiate that, like what we're

Prasanna Malaiyandi:

talking about with deduplication, when we mentioned disk, right?

Prasanna Malaiyandi:

The media layer itself.

Prasanna Malaiyandi:

Yeah, the media layer.

Prasanna Malaiyandi:

Yes.

Prasanna Malaiyandi:

The media layer is not tape.

W. Curtis Preston:

Right, right.

W. Curtis Preston:

Hang on one second.

W. Curtis Preston:

Um, I need to, didn't realize I had a, I had a, um, Meeting

Prasanna Malaiyandi:

Meaning a.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Four.

W. Curtis Preston:

Well, four 15, which is an odd, um, all right.

W. Curtis Preston:

It's a, it's a pre-meeting with a podcast thing.

W. Curtis Preston:

It's, um, anyway, um, so, uh, yeah, so, okay, you know, I hate the idea of flash

Prasanna Malaiyandi:

know, I know, I know.

Prasanna Malaiyandi:

I'm, I, I'm just saying that people will bring it up.

Prasanna Malaiyandi:

So I just wanna clarify that when we talk about disc, we're

Prasanna Malaiyandi:

just talking about not tape.

W. Curtis Preston:

The only place.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Correct.

W. Curtis Preston:

The only place where I think maybe Flash has a place in the backup

W. Curtis Preston:

system is, and you know, you know, the folks over at Pierre and

W. Curtis Preston:

Neil, they're all mad at me now.

W. Curtis Preston:

Right.

W. Curtis Preston:

But, uh, the only place that I, where I think Flash has a place in the backup

W. Curtis Preston:

system is with like live recovery.

W. Curtis Preston:

If you're gonna do, if you're gonna do instant recovery and you're actually gonna

W. Curtis Preston:

run VMs off of your backups, that better be some really nice performing disk.

W. Curtis Preston:

But the thing is, it doesn't need to be your whole system.

W. Curtis Preston:

It just needs to be like the most

Prasanna Malaiyandi:

A part, part of, and it needs to, you don't need

Prasanna Malaiyandi:

your entire system to be flash,

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

You just

Prasanna Malaiyandi:

need enough to be able to support that use case.

W. Curtis Preston:

I, I just think that where Flash does really,

W. Curtis Preston:

really, Is in random access, right?

W. Curtis Preston:

Backup isn't a random access application.

W. Curtis Preston:

Backup is a streaming application.

W. Curtis Preston:

Even if what we're talking is large dedupe chunks.

W. Curtis Preston:

I don't

W. Curtis Preston:

know.

W. Curtis Preston:

I, I,

Prasanna Malaiyandi:

I,

Prasanna Malaiyandi:

I,

W. Curtis Preston:

say, let's just say the jury is out for me.

W. Curtis Preston:

I, I am in Missouri.

W. Curtis Preston:

Missouri.

W. Curtis Preston:

Is that, is that the show me state?

W. Curtis Preston:

That's the show me state.

W. Curtis Preston:

Right?

Prasanna Malaiyandi:

yeah.

W. Curtis Preston:

So I'll tell you what, I'll tell you what.

W. Curtis Preston:

If there's anybody that's listening to this that just got pissed off,

Prasanna Malaiyandi:

what's his

Prasanna Malaiyandi:

name?

Prasanna Malaiyandi:

I'll come back

W. Curtis Preston:

to, I welcome you to, come on and tell me why I'm wrong.

W. Curtis Preston:

I, I just,

Prasanna Malaiyandi:

I, I, I, I know who will come back on, you

W. Curtis Preston:

who, who,

W. Curtis Preston:

will come back on,

Prasanna Malaiyandi:

what's his name?

Prasanna Malaiyandi:

Bass Data guy.

W. Curtis Preston:

uh oh.

W. Curtis Preston:

Oh, are they flash

Prasanna Malaiyandi:

Yeah,

W. Curtis Preston:

mark?

W. Curtis Preston:

Um, No, sorry, Howard.

W. Curtis Preston:

Uh, Howard.

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

Fastest.

Prasanna Malaiyandi:

Pure flash.

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Um, all right.

W. Curtis Preston:

All right.

W. Curtis Preston:

Well, yeah, Howard, uh, you wanna tell, you wanna tell me why I'm wrong?

W. Curtis Preston:

Um, I'm more than happy to have you back.

W. Curtis Preston:

We can duke it out.

W. Curtis Preston:

We can duke it.

W. Curtis Preston:

wouldn't be the first time.

W. Curtis Preston:

Howard and I have, have disagreed on something.

W. Curtis Preston:

I don't know.

W. Curtis Preston:

It's just, it's just there are so many area, there are so many

W. Curtis Preston:

other places where I would wanna spend money in the backup system.

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

Um, and, um,

Prasanna Malaiyandi:

comes down to what the cost is.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

If you could get flash down to a low enough point,

W. Curtis Preston:

which is the point of vast data, right?

W. Curtis Preston:

Their architecture allows using flash in a, um, you know, a significant way,

Prasanna Malaiyandi:

Speaker:

That's, that's why I brought

W. Curtis Preston:

uh, close to cost.

W. Curtis Preston:

Okay.

W. Curtis Preston:

All right.

W. Curtis Preston:

Okay.

W. Curtis Preston:

All right.

W. Curtis Preston:

All right.

W. Curtis Preston:

All right.

W. Curtis Preston:

Um, and then I got this whole other thing.

W. Curtis Preston:

I'm not gonna go into that other thing.

W. Curtis Preston:

Um, but yeah, so d d makes disk and, and cloud-based products, both physiologically

W. Curtis Preston:

feasible as well as economically feasible.

W. Curtis Preston:

Right.

W. Curtis Preston:

Um,

Prasanna Malaiyandi:

is.

W. Curtis Preston:

hmm.

Prasanna Malaiyandi:

Is there something that a person shopping for

Prasanna Malaiyandi:

a dedupe system should be asking?

Prasanna Malaiyandi:

Like what are the important things that they should be

Prasanna Malaiyandi:

asking in order to determine

W. Curtis Preston:

yeah, that's a, that's a great question.

W. Curtis Preston:

I think the, the question would be things about what's the restored performance?

W. Curtis Preston:

Because in the end, that's the only thing that matters.

W. Curtis Preston:

I remember.

W. Curtis Preston:

A product.

W. Curtis Preston:

Now, this product is still on the market, but I believe, I believe

W. Curtis Preston:

they have addressed this, this issue.

W. Curtis Preston:

I remember a dedupe product.

W. Curtis Preston:

It was a Target dedupe product that had, uh, I remember that had 400 megabytes

W. Curtis Preston:

a second throughput in to an appliance.

Prasanna Malaiyandi:

Speaker:

And like 10 megabits out

W. Curtis Preston:

It was 40, it was 40, it was 40, uh, megabytes out.

W. Curtis Preston:

It had a 90%, what we call dedupe tax.

W. Curtis Preston:

Right.

W. Curtis Preston:

That the, because the problem with dedupe, depending on how

W. Curtis Preston:

you store it, is that you've got everything you need all over the

Prasanna Malaiyandi:

All over the place.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

And so this was just a really, really, really bad design.

W. Curtis Preston:

And um, uh, I believe that they addressed it and, um, because that

W. Curtis Preston:

product is still on the market today.

W. Curtis Preston:

But that version, one of that product was ble.

W. Curtis Preston:

Um, so yeah, it's about restored performance, right?

W. Curtis Preston:

So one thing, oh, I'm.

W. Curtis Preston:

Uh, dedupe ratio is crap.

W. Curtis Preston:

Don't look at dedupe ratio.

W. Curtis Preston:

dedupe ratio is a made up number.

W. Curtis Preston:

Um, I will, um, I'll, I'll go back to, I'll pick on Avamar.

W. Curtis Preston:

Avamar.

W. Curtis Preston:

Back in the day, they used to say they had a 400 to one DEDUP ratio.

W. Curtis Preston:

Do you remember this?

W. Curtis Preston:

Because

W. Curtis Preston:

they basically considered every backup as a full backup.

W. Curtis Preston:

They're like, the way we store backups, which is the same way Druva stores

W. Curtis Preston:

backups, the way we store backups.

W. Curtis Preston:

It's like, even though they're incremental, it's like they're a full.

W. Curtis Preston:

, right?

W. Curtis Preston:

Because they behave like a full during a restore.

W. Curtis Preston:

And so they considered every backup a full.

W. Curtis Preston:

And so they said, well then therefore the dedup ratio is 400 to one.

W. Curtis Preston:

Well, that was always complete nonsense.

W. Curtis Preston:

Um, the other would be, I remember, uh, again, I'm gonna pick on people equally.

W. Curtis Preston:

I remember sales reps of a certain large target.

W. Curtis Preston:

D company that where you might've worked, where they would tell customers to go

W. Curtis Preston:

and do full backups more frequently because it made their dedup ratio better.

W. Curtis Preston:

, which is just, again, nonsense.

W. Curtis Preston:

What matters, in my opinion, what matters is how big is a full backup versus

W. Curtis Preston:

how big are all the backups, right?

W. Curtis Preston:

So if I have.

W. Curtis Preston:

If I, let me, let me explain what I'm saying.

W. Curtis Preston:

If I have a hundred terabytes, if, if one full backup of my environment is a hundred

W. Curtis Preston:

terabytes and then after three months how big is, or whatever number you want.

W. Curtis Preston:

Uh, but it's just three months seems like a, a nice, long, um, what do you call it?

W. Curtis Preston:

Uh, POC thing,

W. Curtis Preston:

right?

W. Curtis Preston:

Um, after a hundred, after, you know, three months, how.

W. Curtis Preston:

How much stuff is stored over there?

W. Curtis Preston:

That's what I'm saying.

W. Curtis Preston:

Don't dedupe ratios is nonsense that that didn't come out in English.

W. Curtis Preston:

dedupe ratios are nonsense, but if I can fit a hundred terabytes right, if I have

W. Curtis Preston:

a hundred terabyte environment and then a series of incremental backups, and then

W. Curtis Preston:

over there, my question is how big is.

W. Curtis Preston:

How much data did I write to disk?

W. Curtis Preston:

And let's say it's, it's, it's 200 terabytes after 90 days.

W. Curtis Preston:

And then compare that with another product who writes a hundred terabytes?

W. Curtis Preston:

You backed up the same data, but you used half as much storage on the back end.

W. Curtis Preston:

. That's what I'm saying.

W. Curtis Preston:

The the, the problem is, and the, the other reason, and again,

W. Curtis Preston:

I'm a little extra sensitive to this cuz I work for Druva.

W. Curtis Preston:

People ask us what's our, what's our dedupe ratio?

W. Curtis Preston:

We're like, well the thing is we're like the opposite of Avamar.

W. Curtis Preston:

Well we're actually similar to Avamar in that we're source I dedupe,

W. Curtis Preston:

but we don't use that funny math.

W. Curtis Preston:

So we could say 401, but that's nonsense.

W. Curtis Preston:

So you know, we say, well, we.

W. Curtis Preston:

Because, because we also do incremental forever backups.

W. Curtis Preston:

That's, that's the problem.

W. Curtis Preston:

Right.

W. Curtis Preston:

So, um, but I know that on average, if we have a hundred terabyte

W. Curtis Preston:

customer, we store, you know, roughly a year's worth of backups in less

W. Curtis Preston:

than a hundred terabytes of disk.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

And I think it's important there to also account for that increment, like

Prasanna Malaiyandi:

how I look at these like numbers.

Prasanna Malaiyandi:

I totally get what you said, Curtis, like you should just do an apples apples.

Prasanna Malaiyandi:

But if you don't have that ability, you should also look to say, okay,

Prasanna Malaiyandi:

I have a hundred terabyte full.

Prasanna Malaiyandi:

And then say, my daily change rate is 2%.

Prasanna Malaiyandi:

right?

Prasanna Malaiyandi:

So if I do 2% for a month, right?

Prasanna Malaiyandi:

That's, what is that two 60?

Prasanna Malaiyandi:

60 more terabytes, right?

Prasanna Malaiyandi:

So it should be 160 terabytes worth of data that I sent over, right?

Prasanna Malaiyandi:

For 160 terabytes worth of data, how much should I actually store?

Prasanna Malaiyandi:

Right?

Prasanna Malaiyandi:

Which will give you similar things to what you're saying, right?

Prasanna Malaiyandi:

But Bec, because what you're saying is if you had the two products, then

Prasanna Malaiyandi:

you could do a direct comparison.

Prasanna Malaiyandi:

But I'm saying if you don't have the two products, then

Prasanna Malaiyandi:

here's another way you could

W. Curtis Preston:

Well, I, well, I would argue that there's no way

W. Curtis Preston:

to compare them if you don't have two pro, if you, if you're not, if

W. Curtis Preston:

you're not doing a true comparison.

W. Curtis Preston:

Right.

Prasanna Malaiyandi:

A

W. Curtis Preston:

it's just, it's just that d math is funny, right?

W. Curtis Preston:

So different products charge differently, right?

W. Curtis Preston:

You look at, um, like when you look at Metallic, which competes

W. Curtis Preston:

with Druva, they have a frontend price and we have a backend price.

W. Curtis Preston:

They have, they actually have the front end price, and then you also

W. Curtis Preston:

need to pay for the backend storage.

W. Curtis Preston:

Right?

W. Curtis Preston:

So you're paying, so how do you, how do you compare that?

W. Curtis Preston:

Um, it's, it's just, it's difficult

Prasanna Malaiyandi:

hard.

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

it's hard.

W. Curtis Preston:

Uh, but all I'm saying is dedup ratio is crap and doesn't mean anything.

W. Curtis Preston:

Um, but what does matter is how much data are you storing on that

W. Curtis Preston:

backend because you will be paying for that one way or the other.

W. Curtis Preston:

All right.

W. Curtis Preston:

I don't know if we made this, if we, if this is clear as mud or what, but, uh, I

W. Curtis Preston:

hope that was helpful and, uh, maybe we, maybe we ticked off Howard and Howard's

W. Curtis Preston:

gonna come on next week's episode.

W. Curtis Preston:

. I dunno.

Prasanna Malaiyandi:

Come join us,

Prasanna Malaiyandi:

Howard,

W. Curtis Preston:

Thanks for, thanks for, uh, thanks for helping

W. Curtis Preston:

me with my network as well, so,

Prasanna Malaiyandi:

anytime, Curtis.

Prasanna Malaiyandi:

Just remember I am not tech support.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

All right.

W. Curtis Preston:

Well, uh, and thanks to the listeners and remember to subscribe