Feb. 3, 2025

Artificial Intelligence: The Future of Backup?

Artificial intelligence in backup isn't just marketing hype - it's changing how we protect our data. In this episode, W. Curtis Preston and Prasanna Malaiyandi break down the practical applications of AI in backup systems, from intelligent scheduling to ransomware detection.

Learn how artificial intelligence helps with capacity planning, especially with deduplication systems where predicting storage needs gets tricky. We discuss AI's role in asset discovery, anomaly detection, and even creating better disaster recovery plans. Plus, find out why backing up AI models themselves might become your next big challenge. This no-nonsense look at AI in backup cuts through the confusion and focuses on what really matters - making your backups better.

Speaker:
00:00:00

You've found the backup wrap up your go-to podcast for all things

Speaker:
00:00:04

backup recovery and cyber recovery.

Speaker:
00:00:07

In this episode, we take a look at the use of artificial intelligence in backup.

Speaker:
00:00:12

Can AI make your backup environment actually better?

Speaker:
00:00:16

Prasanna Malaiyandi and I discuss AI and how it can help from

Speaker:
00:00:21

possibly everything from scheduling backups to detecting ransomware.

Speaker:
00:00:25

We talk about using it for deduplication, for capacity planning,

Speaker:
00:00:30

and even helping you to write better disaster recovery plans.

Speaker:
00:00:33

It's time to talk about AI and backups.

Speaker:
00:00:36

Hope you enjoy it.

Speaker:
00:00:37

By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.

Speaker:
00:00:41

Backup, and I've been passionate about backup and recovery for over 30 years.

Speaker:
00:00:46

Ever since I had to tell my boss I. That we had no backups of that really

Speaker:
00:00:50

important database that we had just lost.

Speaker:
00:00:52

I don't want that to happen to you, and that's why I do this podcast.

Speaker:
00:00:56

On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.

Speaker:
00:01:02

This is the backup wrap up.

Speaker:
00:01:20

Welcome to the show.

Speaker:
00:01:22

Hi, I'm w Curtis Preston, AKA, Mr. Backup, and I have with me a guy who apparently

Speaker:
00:01:27

doesn't know how to hold a coffee cup.

Speaker:
00:01:29

Prasanna Malaiyandi, how's it going?

Speaker:
00:01:31

Prasanna

Speaker:
00:01:31

I am good, Curtis.

Speaker:
00:01:32

I. So I think we need to clarify a

Speaker:
00:01:36

are you defending yourself?

Speaker:
00:01:37

Are you gonna try to defend your weirdness?

Speaker:
00:01:39

I think we have to talk about multiple things.

Speaker:
00:01:42

First.

Speaker:
00:01:42

In India, they don't typically use like a mug.

Speaker:
00:01:46

They use like a stainless steel cup, right?

Speaker:
00:01:49

So if

Speaker:
00:01:50

you're drinking hot beverages, you can only hold it from like the very,

Speaker:
00:01:52

you saw it when we went to the Indian

Speaker:
00:01:53

restaurant in San Diego,

Speaker:
00:01:55

yeah, yeah.

Speaker:
00:01:56

you have to hold it from the very top, otherwise you'll burn your hand.

Speaker:
00:01:59

Right.

Speaker:
00:02:00

And then most mugs, it just feels weird.

Speaker:
00:02:03

Like I got, I got chunky fingers, like sausages, right?

Speaker:
00:02:06

And so like putting it inside the mug, like the handle part of the mug.

Speaker:
00:02:10

I feel like, especially if it's like a curve, not like a straight, I feel like

Speaker:
00:02:14

there's not enough stability there.

Speaker:
00:02:17

That's fascinating.

Speaker:
00:02:18

So for people watching the video, who, by the

Speaker:
00:02:20

way, we do publish a video on YouTube if you want to see our

Speaker:
00:02:23

glorious faces and our expressions.

Speaker:
00:02:25

But yeah, so when I hold a mug, I don't hold it like this through the

Speaker:
00:02:29

handle.

Speaker:
00:02:29

I basically grab it either from the top or I hold it like on the side.

Speaker:
00:02:36

And then of course, the Pinky's kind

Speaker:
00:02:38

The pinky,

Speaker:
00:02:39

the bottom.

Speaker:
00:02:39

But what's weird though is the pinky supporting the bottom thing.

Speaker:
00:02:43

I know you've complained to me many times, but that's also how I hold my phone.

Speaker:
00:02:50

you end up covering your microphone.

Speaker:
00:02:51

always hold the phone and

Speaker:
00:02:52

then my Pinky's kind of on the bottom, and so it always blocks the microphone.

Speaker:
00:02:58

So Curtis is always

Speaker:
00:02:59

like, were you underwater?

Speaker:
00:03:00

Did you swallow your phone?

Speaker:
00:03:01

What's going on?

Speaker:
00:03:03

So regarding your defense from, you know, how they hold, do things in India.

Speaker:
00:03:09

What part of India were you born in?

Speaker:
00:03:11

Uh, just remind

Speaker:
00:03:12

Yeah, I was, uh, born in not India, but,

Speaker:
00:03:19

but at home.

Speaker:
00:03:20

Right.

Speaker:
00:03:21

But yeah,

Speaker:
00:03:21

you were, you were raised by people born in

Speaker:
00:03:23

India, and so you were, you were taught,

Speaker:
00:03:26

yeah.

Speaker:
00:03:27

And so actually I prefer, so even drinking water.

Speaker:
00:03:30

I don't drink from a glass cup.

Speaker:
00:03:32

I drink from a stainless steel cup.

Speaker:
00:03:34

Right.

Speaker:
00:03:34

Which is, if you haven't spent any time around, you know,

Speaker:
00:03:37

Indians, you wouldn't know that.

Speaker:
00:03:39

It's just that you use a lot, you use stainless steel for cups, for plates,

Speaker:
00:03:45

right.

Speaker:
00:03:45

As Curtis knows what I'm loading, the dishwasher and

Speaker:
00:03:48

he's like, what is that racket?

Speaker:
00:03:50

what is happening over there?

Speaker:
00:03:53

Because everything's so noisy.

Speaker:
00:03:55

They last longer and you don't have to worry about them breaking.

Speaker:
00:03:59

That's, you know, I can't, I can't complain.

Speaker:
00:04:01

Yeah.

Speaker:
00:04:03

Uh, but yeah, I don't get the whole knot, you know?

Speaker:
00:04:06

Here I am with four fingers in my mug.

Speaker:
00:04:09

I'm just saying.

Speaker:
00:04:10

Okay, so now what if that mug was smaller and the handle was curved,

Speaker:
00:04:14

Well, then that's like a, that's like a girly mug and then,

Speaker:
00:04:18

then you use two fingers like

Speaker:
00:04:20

this.

Speaker:
00:04:21

feel like it gives you enough stability?

Speaker:
00:04:23

And yet I've never dropped a mug.

Speaker:
00:04:25

I'm

Speaker:
00:04:25

just saying.

Speaker:
00:04:26

It's not from dropping the mug.

Speaker:
00:04:27

It's from like when you, yeah.

Speaker:
00:04:29

See when you're drinking it, it just feels like it's a little like,

Speaker:
00:04:33

Yeah.

Speaker:
00:04:33

Um,

Speaker:
00:04:34

all over you.

Speaker:
00:04:34

I just think you don't know how to hold a mic, but.

Speaker:
00:04:38

Our listeners are probably like, what are these people talking about?

Speaker:
00:04:41

By the way, this is a new format starting in the new year.

Speaker:
00:04:44

We are now gonna just be talking about coffee and all the crazy

Speaker:
00:04:46

things that Prasanna does.

Speaker:
00:04:49

Yeah, absolutely.

Speaker:
00:04:51

Um, or maybe we might actually talk about some stuff.

Speaker:
00:04:56

So I thought, um, you know, we've been seeing, uh, AI on the news a lot,

Speaker:
00:05:03

right?

Speaker:
00:05:03

ai, I've never heard about it.

Speaker:
00:05:05

Yeah, I've never, never heard of it.

Speaker:
00:05:07

Yeah.

Speaker:
00:05:08

So artificial intelligence, and if, if you've been following the backup

Speaker:
00:05:13

industry much, you probably saw a few announcements from your, uh, backup

Speaker:
00:05:22

company or maybe backup companies you're interested in about the use of ai.

Speaker:
00:05:28

Within backup.

Speaker:
00:05:29

And so I thought we'd talk about that a little bit,

Speaker:
00:05:31

um, in this episode, and

Speaker:
00:05:33

whether or not it has a use, right?

Speaker:
00:05:37

And can, just to clarify, I think when a lot of these backup vendors launched ai,

Speaker:
00:05:43

they were using AI for like the, not for the core product, right?

Speaker:
00:05:49

So they were using AI for their support agent, or to help answer questions, right?

Speaker:
00:05:54

Which I think we all understand, we all know about, but I think in this

Speaker:
00:05:58

episode, I think we should focus on like the core part of backup.

Speaker:
00:06:03

Yeah.

Speaker:
00:06:03

So, so let's talk a just a little bit about, you know,

Speaker:
00:06:06

what we mean when we say ai.

Speaker:
00:06:08

There are different categories of ai and then also there's machine learning, which

Speaker:
00:06:12

is very closely, and honestly, I, I, I,

Speaker:
00:06:16

you know, I think I could describe the difference between machine

Speaker:
00:06:19

learning and ai, but then there's something that, that.

Speaker:
00:06:22

Changes, you know, that, that messes me up when we talk about that.

Speaker:
00:06:26

Um, I'll just, for those of you that actually really know what AI

Speaker:
00:06:29

is and machine learning is, you're gonna be offended by something

Speaker:
00:06:32

I say during this episode.

Speaker:
00:06:34

I, I'll just tell you that.

Speaker:
00:06:35

But we're gonna use the terms almost interchangeably, but they're not.

Speaker:
00:06:38

Uh, but I do want distinguish between.

Speaker:
00:06:41

What is referred to as generative ai, right?

Speaker:
00:06:45

Which is a, you know, a large language model that is

Speaker:
00:06:49

going to create things there.

Speaker:
00:06:52

It's not ex nihilo, right?

Speaker:
00:06:53

It's not from, it's not from nothing.

Speaker:
00:06:55

It's it, it has to, it has to have been trained on a large data set.

Speaker:
00:06:59

But, those are the kinds of things that they're using,

Speaker:
00:07:01

like you talked about there.

Speaker:
00:07:03

Sup for support

Speaker:
00:07:04

models, right?

Speaker:
00:07:05

And, And,

Speaker:
00:07:05

just as examples of large language models, you might've heard about

Speaker:
00:07:09

meta's llama, lama three, Lama four, there's chat, GPT or open ais.

Speaker:
00:07:15

What is it?

Speaker:
00:07:16

OPT?

Speaker:
00:07:19

What,

Speaker:
00:07:20

the, the, actual model.

Speaker:
00:07:21

the

Speaker:
00:07:21

underlying model.

Speaker:
00:07:22

Oh, okay.

Speaker:
00:07:23

I, I, I would just, I would've just said chat, GPT.

Speaker:
00:07:25

'cause everybody knows what chat GPT

Speaker:
00:07:27

is, right?

Speaker:
00:07:27

I mean, you've got copilot, you've got, you've

Speaker:
00:07:29

got, Yeah, you, so you've got Claude from Anthropic.

Speaker:
00:07:33

Um, there are a lot of people, you know, um, confused the company with the product.

Speaker:
00:07:39

But, um, these are the, these are the ones that are grabbing

Speaker:
00:07:42

all the headlines, right?

Speaker:
00:07:43

They're also, they're also writing large bodies of texts.

Speaker:
00:07:47

They're helping people to write books.

Speaker:
00:07:49

They're helping people to do art.

Speaker:
00:07:51

That, and there's a lot of, um.

Speaker:
00:07:54

A lot of legal discussions around that, around the use of things like

Speaker:
00:07:59

the books that I've written as, um, you know, feeding into that and, um,

Speaker:
00:08:05

the, we're not talking about that,

Speaker:
00:08:07

right?

Speaker:
00:08:08

Um, we're not gonna talk about, Hey, um, chat GPT.

Speaker:
00:08:13

My restore didn't work.

Speaker:
00:08:14

Can you recreate all my documents?

Speaker:
00:08:17

Um, it's not,

Speaker:
00:08:18

there's not gonna be anything like that, at least not yet.

Speaker:
00:08:21

Um, the, um, we're gonna talk about how AI can be used to basically

Speaker:
00:08:28

enhance the core functionality.

Speaker:
00:08:31

I mean, you said this in way, a fewer words a few minutes ago,

Speaker:
00:08:34

but, uh, basically how it could be used to make backups better.

Speaker:
00:08:40

And I think a good chunk of this is really, like you said, more

Speaker:
00:08:44

around machine learning models,

Speaker:
00:08:47

right,

Speaker:
00:08:47

right,

Speaker:
00:08:48

large language models.

Speaker:
00:08:50

right.

Speaker:
00:08:50

So the, the first section we will just talk about how potentially just talk about

Speaker:
00:08:58

this is just sort of thoughts out loud.

Speaker:
00:08:59

I know that we have a lot of vendors that listen to the podcast.

Speaker:
00:09:02

We are.

Speaker:
00:09:03

Technically aimed at the, the people who actually use backup and

Speaker:
00:09:07

recovery, but I know a lot of vendors use the podcast, so feel free to

Speaker:
00:09:11

take this episode and run with it and

Speaker:
00:09:13

do stuff.

Speaker:
00:09:14

So I, I guess the first question would be, do we think that, uh, machine learning

Speaker:
00:09:20

can be used to help just to prove the efficiency of the backup process itself?

Speaker:
00:09:26

What do you think about

Speaker:
00:09:27

Oh, a thousand percent.

Speaker:
00:09:28

A billion percent, Curtis.

Speaker:
00:09:30

So I've never actually had to implement a backup system.

Speaker:
00:09:33

But you've done

Speaker:
00:09:34

tons of this, right?

Speaker:
00:09:35

And how do you go about just planning your backup, right?

Speaker:
00:09:41

How to back up an infrastructure, right?

Speaker:
00:09:42

It's like, just walk us through that, right?

Speaker:
00:09:45

And how many spreadsheets and all the rest that you have in

Speaker:
00:09:49

order to try to optimize these.

Speaker:
00:09:51

Yeah, I, I think about that a lot.

Speaker:
00:09:53

And, and, and, and, and the answer is gonna depend greatly on the

Speaker:
00:09:58

product that you're using, right?

Speaker:
00:09:59

You know, I, I can think of.

Speaker:
00:10:01

The traditional way is that you're going to create some kind of schedule, some

Speaker:
00:10:06

kind of, uh, automatic backup schedule.

Speaker:
00:10:09

Um, and you're going to do a, again, traditionally we'll

Speaker:
00:10:13

do three categories here.

Speaker:
00:10:14

Traditionally you've got some full backups and you're gonna do some

Speaker:
00:10:17

full backups every once in a while.

Speaker:
00:10:19

Um, and I was always a proponent if you had to do full backups, I was

Speaker:
00:10:23

always a proponent of doing those.

Speaker:
00:10:25

No.

Speaker:
00:10:27

More often than once a month.

Speaker:
00:10:30

Um, back in the days of tape, it was once a week because

Speaker:
00:10:34

it, was

Speaker:
00:10:35

complicated the restore process.

Speaker:
00:10:36

Yeah.

Speaker:
00:10:37

But, um, you know, doing it no more often than once a month, but depending on your

Speaker:
00:10:42

backup product, you might be able to, to

Speaker:
00:10:44

spread that out even over like three months.

Speaker:
00:10:46

And then you also want to schedule, if your backup product

Speaker:
00:10:50

is capable of doing it, you wanna schedule a cumulative incremental.

Speaker:
00:10:55

A differential, some products call it.

Speaker:
00:10:57

Um, and then of course the daily incremental.

Speaker:
00:11:00

Right.

Speaker:
00:11:01

So spreading

Speaker:
00:11:02

that all

Speaker:
00:11:03

for one application you're talking about,

Speaker:
00:11:05

E exactly.

Speaker:
00:11:06

You're doing this per application, per server.

Speaker:
00:11:10

Um, and, and you're trying to load balance things out because if you've

Speaker:
00:11:17

properly designed your system, it's probably not capable of doing a full

Speaker:
00:11:21

backup of your environment in one night.

Speaker:
00:11:24

Right.

Speaker:
00:11:24

Um, because that would just be really expensive, and then the rest of the

Speaker:
00:11:28

time it would go completely unused.

Speaker:
00:11:30

Right?

Speaker:
00:11:31

Um, so you, you buy it so that it's you, you size it so that it's big

Speaker:
00:11:35

enough to do a full backup over time.

Speaker:
00:11:38

And, um, you're right that, that, that scheduling that out is problematic, right?

Speaker:
00:11:45

Um, and you, you definitely could use, um, uh, AI

Speaker:
00:11:49

or ML to, to do that.

Speaker:
00:11:51

And even for the scheduling aspect.

Speaker:
00:11:53

So we talked about the applications, and then you were talking about sort

Speaker:
00:11:56

of that infrastructure piece, which is shared and you now have to worry

Speaker:
00:11:59

about it across all of these things.

Speaker:
00:12:02

And I'm sure you had these bonkers spreadsheets that you

Speaker:
00:12:04

were creating, trying to do this.

Speaker:
00:12:06

Did it stretch all the way to the moon and back, by the way?

Speaker:
00:12:11

Well, you know me for, it wasn't even a spreadsheet, it was just, uh, it, it was a

Speaker:
00:12:15

script.

Speaker:
00:12:16

Right.

Speaker:
00:12:16

I would, I would just script all this nonsense.

Speaker:
00:12:18

Right?

Speaker:
00:12:19

Um, but it, but it, the bigger the environment, the more.

Speaker:
00:12:23

That doing it programmatically made sense, right?

Speaker:
00:12:26

Um, and, and by the way, even if you have a more modern backup tool

Speaker:
00:12:30

that does incremental forever, there are many applications that

Speaker:
00:12:35

won't, that won't let you do

Speaker:
00:12:36

that.

Speaker:
00:12:37

Right?

Speaker:
00:12:37

I think of like database backups still need to be done every, you know, a full

Speaker:
00:12:41

backup every so often, and you have to schedule these out,

Speaker:
00:12:44

And that's the

Speaker:
00:12:44

second category.

Speaker:
00:12:45

'cause I know you talked about three categories.

Speaker:
00:12:48

Yeah.

Speaker:
00:12:48

Oh yeah.

Speaker:
00:12:49

Oh, well the three categories were, yes.

Speaker:
00:12:51

Uh, thank you.

Speaker:
00:12:53

I'm glad I have you here sometimes, you know.

Speaker:
00:12:55

Yeah.

Speaker:
00:12:55

So you have the, the, the old school full and incremental,

Speaker:
00:12:58

which old school is still current

Speaker:
00:13:00

school.

Speaker:
00:13:00

If we're talking about regular apps, then there's the forever incremental type.

Speaker:
00:13:05

Um, and you don't, you, you do have to worry about scheduling those,

Speaker:
00:13:09

but generally you just sort of tell 'em all to start at once and then

Speaker:
00:13:12

they queue and then it is not, it's, it's a lot simpler to do those.

Speaker:
00:13:17

I. But then the final category are ones that actually, um, and I

Speaker:
00:13:22

think the one that probably stands out the most here would be Rubrik,

Speaker:
00:13:27

right?

Speaker:
00:13:27

Rubrik doesn't let you schedule, um, that

Speaker:
00:13:30

stuff.

Speaker:
00:13:31

You tell it what your RTO

Speaker:
00:13:33

is and your RPO, and it just does the backups.

Speaker:
00:13:36

I mean, in fact, there are people that complain that you cannot, at least

Speaker:
00:13:40

last time I checked, you could not do.

Speaker:
00:13:42

a a manually scheduled backup if you wanted to tell it when to do stuff.

Speaker:
00:13:47

Um, I, I think this is probably the first use of some sort of machine learning

Speaker:
00:13:53

or artificial intelligence that I can think of with regards to scheduling.

Speaker:
00:13:56

Which, which I was also gonna chime in.

Speaker:
00:13:58

So the first two methods you talked about, right?

Speaker:
00:14:01

You're kind of statically doing this upfront, setting the schedules and

Speaker:
00:14:06

hoping that forever that it will be good,

Speaker:
00:14:09

Right.

Speaker:
00:14:09

You'll always be able to meet it, but say that there's an additional load or a

Speaker:
00:14:13

server goes down or something else, right.

Speaker:
00:14:15

There's no way to fine tune and adjust that,

Speaker:
00:14:18

Well, well, I, Well, there, I mean, there is, but there's

Speaker:
00:14:21

no way to automatically fine

Speaker:
00:14:23

tune and Yeah.

Speaker:
00:14:24

Yeah.

Speaker:
00:14:24

Right.

Speaker:
00:14:24

And so you're just like, okay, maybe it'll fail a couple times

Speaker:
00:14:28

and then I'll adjust the policies and then I'll be fine, but Right.

Speaker:
00:14:31

Versus something like an SLA based, which I, I actually have

Speaker:
00:14:35

looked at rubrics in the past,

Speaker:
00:14:36

and I find that very enticing because really in the end, you

Speaker:
00:14:41

care about what your RPO and RTO,

Speaker:
00:14:44

Yeah.

Speaker:
00:14:44

No one cares if you can back up.

Speaker:
00:14:45

They only care if you can restore.

Speaker:
00:14:46

the problem though is it's such a big paradigm shift for a lot of backup admins

Speaker:
00:14:53

that it's very difficult to understand because it's like when people move

Speaker:
00:14:57

from on-premises to the cloud and they were concerned because they're like,

Speaker:
00:15:00

I can't touch and feel my equipment.

Speaker:
00:15:02

Right.

Speaker:
00:15:03

It's not something I could actually do.

Speaker:
00:15:04

I think that's also the same challenges you get when you move

Speaker:
00:15:07

from sort of, uh, schedule-based backups to sort of SLA based backups.

Speaker:
00:15:12

Yeah, I, I liked, I liked the idea a lot.

Speaker:
00:15:15

I, I, I still, again, you know, if I was, if I was running rubric,

Speaker:
00:15:20

I would give people the ability to do a manual backup if they

Speaker:
00:15:22

wanted to.

Speaker:
00:15:23

But, but I do really like the idea of SLA driven backups,

Speaker:
00:15:27

because I like the idea of SLAs.

Speaker:
00:15:29

You know, we've talked about SLAs on here, and I like the idea of.

Speaker:
00:15:32

Knowing the back backups were being done often enough to meet my SLAs.

Speaker:
00:15:36

really liked that idea.

Speaker:
00:15:38

The one thing I think that is useful with these sort of approaches is

Speaker:
00:15:43

we've talked about the fact that like your environment doesn't say static.

Speaker:
00:15:47

Right.

Speaker:
00:15:48

So as you're adding new workloads, as things are changing, you don't

Speaker:
00:15:53

want to have to go recompute your entire spreadsheet or your

Speaker:
00:15:56

script H every single time.

Speaker:
00:15:58

So it's nice to have sort of these models that can automatically help fine tune and

Speaker:
00:16:03

optimize so you're not wasting your time because it's more than likely that you're

Speaker:
00:16:07

not gonna get it right the first time if you manually try to reset some of these

Speaker:
00:16:10

things.

Speaker:
00:16:11

And so having this automatic thing that constantly is

Speaker:
00:16:14

adjusting just seems amazing.

Speaker:
00:16:18

Yeah, it does.

Speaker:
00:16:18

And I, and outside of Rubrik, I'm not aware of any tools that do that.

Speaker:
00:16:23

Uh, but I, I think that this could certainly be a way where

Speaker:
00:16:25

they could use AI to do that.

Speaker:
00:16:27

Um, the.

Speaker:
00:16:30

And I, and I was thinking about, again, going back to it, it's been a

Speaker:
00:16:33

while since I've had to do this in a production environment, but the, the

Speaker:
00:16:36

the first thing that you have to find out is how big is everything, right?

Speaker:
00:16:40

How big is, is everything from a database perspective and

Speaker:
00:16:43

how, how long does it take?

Speaker:
00:16:45

'cause there's all these different, and that's the thing that nobody knows.

Speaker:
00:16:48

Right.

Speaker:
00:16:48

How big is your, how big is your data center?

Speaker:
00:16:50

And they're like, I don't know.

Speaker:
00:16:51

I don't know.

Speaker:
00:16:52

And so like, you have to do a full backup first

Speaker:
00:16:54

before you have any idea.

Speaker:
00:16:55

And not every server backs up at the same speed and all these different things.

Speaker:
00:16:59

So yeah, it it is a

Speaker:
00:17:00

complicated

Speaker:
00:17:01

and you may not be able to back up everything at the same

Speaker:
00:17:03

time because there might be

Speaker:
00:17:04

different hours, right?

Speaker:
00:17:05

That

Speaker:
00:17:06

a server is sort of offline or has less load that you can actually do it.

Speaker:
00:17:12

Yeah, so having some sort of AI or ml, um, figure that out sounds amazing.

Speaker:
00:17:17

Right?

Speaker:
00:17:18

Another area where I think that this could help is very, very closely related, and

Speaker:
00:17:23

that is, and, and some backup products do have this and that is making sure

Speaker:
00:17:29

that everything in my data center.

Speaker:
00:17:31

Is backed up in some

Speaker:
00:17:34

way, right?

Speaker:
00:17:35

Usually where you see this is an integration with like, um, uh,

Speaker:
00:17:40

VMware or, uh, AWS, et cetera, right?

Speaker:
00:17:44

Um, basically just connect to my entire, uh, you know, control

Speaker:
00:17:49

panel and then just look and make sure that everything is connected

Speaker:
00:17:53

to some type of policy to back it

Speaker:
00:17:56

up.

Speaker:
00:17:56

I, I think.

Speaker:
00:17:57

a default policy if anything is created, so at least everything

Speaker:
00:18:00

is protected, even though

Speaker:
00:18:02

it may not be protected with the right thing, but at least it's

Speaker:
00:18:04

being protected and you don't have to worry about these gaps.

Speaker:
00:18:06

Speaker:
00:18:07

Exactly.

Speaker:
00:18:07

Exactly.

Speaker:
00:18:08

Um, and I, I think you do see this in a lot of backup products.

Speaker:
00:18:15

Usually again, it's with integration

Speaker:
00:18:17

with, uh, big things like VMware, HyperV, AWS, um,

Speaker:
00:18:22

you know, et cetera.

Speaker:
00:18:24

you need the companies, those vendors, to actually provide the APIs to be

Speaker:
00:18:28

able to do these sort of queries, and I think that's where there's kind

Speaker:
00:18:31

of a little bit of a tension there,

Speaker:
00:18:33

Yeah.

Speaker:
00:18:35

Yeah.

Speaker:
00:18:35

I mean, theoretically you could scour the data center, right?

Speaker:
00:18:39

Uh, looking for new computers.

Speaker:
00:18:41

Again, I, I know I mentioned this before, but you know, back

Speaker:
00:18:44

in the day we did that, right?

Speaker:
00:18:47

And back in the day we did that with Vizio.

Speaker:
00:18:49

Um, the, the vis, there used to be a very

Speaker:
00:18:52

expensive version of Vizio that would just literally crawl your data center.

Speaker:
00:18:56

And it used, uh, some very interesting technology.

Speaker:
00:19:00

Um, I forgot the, the name of this, but like, inmap

Speaker:
00:19:04

does this, where it, what it does is it sends a malformed packet.

Speaker:
00:19:09

It finds an IP address, it sends a malformed packet to that IP address

Speaker:
00:19:12

to see how it responds, and different things respond in different ways.

Speaker:
00:19:16

And that's how it, that's how it, um,

Speaker:
00:19:18

That

Speaker:
00:19:19

is crazy that they built that.

Speaker:
00:19:21

Yeah.

Speaker:
00:19:22

Yeah.

Speaker:
00:19:22

Um, and so you, you could theoretically do that, but a agreed, it's much easier

Speaker:
00:19:27

if you just have, everything's gonna be in VMware or AWS and then just talk to AWS.

Speaker:
00:19:32

Now again, going to VMware and AWS, there can be multiple virtual data centers.

Speaker:
00:19:37

There can be

Speaker:
00:19:37

multiple AWS accounts.

Speaker:
00:19:39

So you, you, you want to make sure that, that you have some way to, to

Speaker:
00:19:42

do that.

Speaker:
00:19:43

And I, and I do like that idea.

Speaker:
00:19:45

Shadow it.

Speaker:
00:19:47

Yeah, shadow it bad,

Speaker:
00:19:48

especially when it comes to backup.

Speaker:
00:19:49

Right.

Speaker:
00:19:50

Um, again, I'll tell a story from back in the day was the time that someone came to

Speaker:
00:19:56

me and they had, they were DBAs and they, they gave me a directory of a database.

Speaker:
00:20:01

They wanted me to restore.

Speaker:
00:20:02

Restore, and it was temp, um slash TMP on a, on a HP box.

Speaker:
00:20:08

And for those that don't know slash TMP on an HP box specifically, HPUX was in ram.

Speaker:
00:20:16

So when you rebooted it, temp went away.

Speaker:
00:20:19

And this, um,

Speaker:
00:20:21

this

Speaker:
00:20:21

it source code,

Speaker:
00:20:23

what I.

Speaker:
00:20:23

it?

Speaker:
00:20:23

Source code

Speaker:
00:20:24

It was source code.

Speaker:
00:20:25

Yeah.

Speaker:
00:20:26

And they were developing for months, like an entire team of

Speaker:
00:20:29

developers developing source code of this new application in temp.

Speaker:
00:20:34

And then we rebooted the server and they, and they came to me

Speaker:
00:20:39

and asked me to restore it.

Speaker:
00:20:41

And I was like, dude, we don't back up temp. I don't know

Speaker:
00:20:44

what you're talking about.

Speaker:
00:20:45

Like, and they're like, dude, this is really important,

Speaker:
00:20:47

like heads are gonna roll.

Speaker:
00:20:48

And I'm like, yeah, not mine.

Speaker:
00:20:50

Like everybody knows we don't back up temp.

Speaker:
00:20:53

Except for you, apparently.

Speaker:
00:20:55

Uh, so it's, I'm just, you know, it's really bad when you have

Speaker:
00:20:59

a functioning system and then it's not being backed up again.

Speaker:
00:21:02

Another story we used to have, um, we had a, a naming convention.

Speaker:
00:21:07

Ours was very boring.

Speaker:
00:21:09

Um, it, it was, it H-P-D-B-S-V-A, right?

Speaker:
00:21:13

HP database server A, and there was HB FS oh one, et

Speaker:
00:21:16

cetera, right?

Speaker:
00:21:18

And I remember, and I had this form that you had to fill out.

Speaker:
00:21:22

This was an actual piece of paper.

Speaker:
00:21:24

We did

Speaker:
00:21:24

not have web pages.

Speaker:
00:21:26

Right?

Speaker:
00:21:27

You had this form that you fill out and, and you had to, and, and it, it said

Speaker:
00:21:31

on there, simply filling out this form is not, does not meet the requirement.

Speaker:
00:21:35

You do not consider your system backed up until you have a signed form back from me.

Speaker:
00:21:40

Right?

Speaker:
00:21:41

And then one day somebody handed me a form and it said like.

Speaker:
00:21:44

They wanted, like me to back up H-P-D-B-S-V-M, right?

Speaker:
00:21:50

And I go, M that's interesting.

Speaker:
00:21:53

The last server I remember hearing about was H. So that means there's an I, A

Speaker:
00:21:58

J, A K, and an L out there somewhere.

Speaker:
00:22:01

hasn't been backed up.

Speaker:
00:22:02

That hasn't been backed up.

Speaker:
00:22:04

Yeah.

Speaker:
00:22:05

Um, so this idea of automatically

Speaker:
00:22:07

detecting servers and applications sounds like a great

Speaker:
00:22:10

idea.

Speaker:
00:22:11

And also not just VMs, but also detect, it would be really

Speaker:
00:22:15

nice if it detected the type of

Speaker:
00:22:16

VM and said, this appears to be a SQL instance.

Speaker:
00:22:20

We should back it up with the default SQL

Speaker:
00:22:21

policy.

Speaker:
00:22:22

That would be great.

Speaker:
00:22:24

So in addition to making things more efficient, um, there are some

Speaker:
00:22:28

other things we could do, uh, with AI that also would be interesting.

Speaker:
00:22:33

Uh, what do

Speaker:
00:22:34

you think is the, the first one?

Speaker:
00:22:35

No.

Speaker:
00:22:35

So I think one of the ones, and we've talked about it so much, so often,

Speaker:
00:22:39

and vendors are starting to do this, it's around anomaly detection and

Speaker:
00:22:44

it could be used in various fashion.

Speaker:
00:22:46

So one thing is like, Hey, by the way, this server, all of a sudden it's backing

Speaker:
00:22:52

up 10 times what it normally does.

Speaker:
00:22:55

Maybe this might indicate like a malware or ransomware on the system.

Speaker:
00:23:00

Right?

Speaker:
00:23:00

Um.

Speaker:
00:23:01

Or Hey, I've noticed that there's a bunch of data that's starting

Speaker:
00:23:05

to look like based on entropy.

Speaker:
00:23:06

That it's been encrypted, that doesn't look normal.

Speaker:
00:23:09

Okay, maybe I should go investigate it, right?

Speaker:
00:23:12

So, or it could even be security things like, Hey, you're logging

Speaker:
00:23:16

in from a different place than normal as a backup admin.

Speaker:
00:23:19

Is this the right thing or not?

Speaker:
00:23:22

Yeah.

Speaker:
00:23:22

And also very closely related to the stuff you said before was, uh,

Speaker:
00:23:28

are files where the file type based on the first few bytes of the file,

Speaker:
00:23:35

does not match the extension of the

Speaker:
00:23:37

file.

Speaker:
00:23:37

So it says it's a dot doc, but the first few bites of the file

Speaker:
00:23:42

show that it's an application, for

Speaker:
00:23:43

Sorry, one

Speaker:
00:23:44

Yeah, that's an interesting use case around, uh, the first few bites because

Speaker:
00:23:50

that could detect things that are being encrypted or other things that don't

Speaker:
00:23:56

make sense, or potentially even malware.

Speaker:
00:23:58

Right.

Speaker:
00:23:59

Yeah, it, uh, it's something we do, you know, my, uh, employee is S two

Speaker:
00:24:03

data and we do a lot of restores of old stuff, um, where we're pulling

Speaker:
00:24:09

data off of tape often for, um, I. For e-discovery purposes and lawsuit

Speaker:
00:24:16

purposes and, um, investigation purposes.

Speaker:
00:24:19

And one of the things that we do as we're pulling data, 'cause we

Speaker:
00:24:22

use a, a, a proprietary tool that we've written to restore data off

Speaker:
00:24:27

of most backups rather than use the built in tool for a lot of reasons.

Speaker:
00:24:33

Um, and this is one of them is that we check the file type against the file

Speaker:
00:24:37

contents and, uh, it can, it can also indicate.

Speaker:
00:24:42

Um, uh, subterfuge,

Speaker:
00:24:44

right?

Speaker:
00:24:45

Um, it can indicate somebody trying to hide something.

Speaker:
00:24:48

Um, but yeah, so anomaly detection, I think is a really big one.

Speaker:
00:24:51

Uh, right.

Speaker:
00:24:52

Definitely that this is a, this is a, you looks like you've got ransomware, right?

Speaker:
00:24:58

You need

Speaker:
00:24:59

to solve that.

Speaker:
00:25:00

That was probably the, the first big use of AI that I

Speaker:
00:25:02

remember, uh, in, in the backup world.

Speaker:
00:25:05

And I, I, I will say that if.

Speaker:
00:25:08

The way that you know, that you have ransomware is that your backup

Speaker:
00:25:11

product told you something is wrong, but, uh, but it, but it can

Speaker:
00:25:16

happen.

Speaker:
00:25:16

Right.

Speaker:
00:25:17

Um, another one that I'll talk, uh, that I'd bring up is, is data classification.

Speaker:
00:25:23

Again, I think that.

Speaker:
00:25:26

This is, this is probably a very simple one, but the

Speaker:
00:25:29

idea of like, looking at all the different data types and helping you to

Speaker:
00:25:33

understand what is in your environment.

Speaker:
00:25:35

This is not that new.

Speaker:
00:25:37

Um, but perhaps the AI use case could be helping you to identify trends,

Speaker:
00:25:43

um, and, and where the data's moving, where it's being created, where

Speaker:
00:25:47

it's being changed, uh, et cetera.

Speaker:
00:25:50

Um, and, and then, which is very closely related to my

Speaker:
00:25:53

other idea, which is predictive

Speaker:
00:25:54

analytics.

Speaker:
00:25:56

Right.

Speaker:
00:25:56

Um, again, going back to, uh, you know, back in the day,

Speaker:
00:26:01

one of the things I remember being the hardest to do is capacity prediction.

Speaker:
00:26:08

You

Speaker:
00:26:08

know, predicting whether or not I have enough capacity To

Speaker:
00:26:12

do my backups for the next six

Speaker:
00:26:13

and you know what makes it even harder?

Speaker:
00:26:15

What's that?

Speaker:
00:26:17

It does, d ddu makes it way harder.

Speaker:
00:26:21

And you know what AI right?

Speaker:
00:26:24

Ai ml could, could use to, could be used because it's smarter than I am.

Speaker:
00:26:30

Smarter than you are.

Speaker:
00:26:31

It could actually understand the trends

Speaker:
00:26:34

as to now what, what, let's talk about that Non, not every,

Speaker:
00:26:39

everybody might not understand.

Speaker:
00:26:41

Why DDU makes capacity,

Speaker:
00:26:44

Sure.

Speaker:
00:26:45

uh, management so

Speaker:
00:26:47

So let's talk about the, before we get to D Dub, let's talk about like

Speaker:
00:26:50

traditional storage or tape, right?

Speaker:
00:26:52

Speaker:
00:26:53

you're doing a full backup, you know how big your database is, therefore,

Speaker:
00:26:56

you know, okay, my full backup is gonna take this much space and

Speaker:
00:27:00

you know, with compression, maybe it's gonna be two x or half the space, right?

Speaker:
00:27:04

And then, you know, okay, my daily change rate is say 5%, and based on the

Speaker:
00:27:08

total size, I know what that's gonna be.

Speaker:
00:27:10

And so

Speaker:
00:27:10

if I'm doing weekly fulls, daily incrementals, I know how much

Speaker:
00:27:14

storage I'm gonna need for a week.

Speaker:
00:27:16

Yeah.

Speaker:
00:27:16

And, and just as, and just as important, you also know how

Speaker:
00:27:20

much storage, when you delete

Speaker:
00:27:23

the, you know, the older backups.

Speaker:
00:27:26

Yeah.

Speaker:
00:27:26

You know how much storage will be freed up, which is just if, if not even more

Speaker:
00:27:29

important.

Speaker:
00:27:30

Now the problem with deduplication is they talk about these great rates like

Speaker:
00:27:34

40 x, 30 x, 20 x, take your pick, right?

Speaker:
00:27:38

And that's all great.

Speaker:
00:27:39

If you're all like if a lot of your data is very similar, but it's hard

Speaker:
00:27:44

to tell, is your data similar or not until you've actually start doing it.

Speaker:
00:27:48

So if you're trying to buy storage for, say, three years

Speaker:
00:27:51

ahead of time, a capacity plan.

Speaker:
00:27:53

It becomes really difficult.

Speaker:
00:27:54

And so you guess, right?

Speaker:
00:27:56

You'll take a stab and maybe you look at some of your data and you're like,

Speaker:
00:27:58

Hey, these kind of look the same, but you don't know if that's right or not

Speaker:
00:28:02

until you actually start backing it up.

Speaker:
00:28:04

And like you said, Curtis, if you go delete your backup, you may not

Speaker:
00:28:09

actually free up that space because it's been de-duplicated against something

Speaker:
00:28:12

else that you're still preserving.

Speaker:
00:28:14

right,

Speaker:
00:28:15

Say I go delete my backup for six months ago for one application.

Speaker:
00:28:19

Another application might have, uh, common blocks with that data or with that other

Speaker:
00:28:24

application.

Speaker:
00:28:25

And so even though I deleted the first application's backup,

Speaker:
00:28:28

it's not gonna free up space.

Speaker:
00:28:29

And so you end up with this problem and this challenge.

Speaker:
00:28:33

And that's one of the things, the hardest things about deduplication.

Speaker:
00:28:37

Having worked at a company that did deduplication, customers

Speaker:
00:28:41

always struggled with it,

Speaker:
00:28:43

Yeah,

Speaker:
00:28:44

And some of the

Speaker:
00:28:44

things we would do is we would be like, Hey, let's scan your

Speaker:
00:28:47

application and just understand what sort of DDU rates you may get.

Speaker:
00:28:52

And even that's a guess, because maybe you move an application from one storage

Speaker:
00:28:55

appliance to a different appliance and now your DDU rates are different.

Speaker:
00:29:00

Yeah.

Speaker:
00:29:01

And, and, and again, the

Speaker:
00:29:02

one of the most frustrating things could be if you, you start.

Speaker:
00:29:06

You're running outta capacity, right?

Speaker:
00:29:09

And so you say, listen, I know we said we wanted to keep backups for

Speaker:
00:29:13

three years, but we're running outta capacity and so we're gonna start

Speaker:
00:29:16

deleting three years minus a month.

Speaker:
00:29:19

And you do that and you get

Speaker:
00:29:20

back 0.1% of your, it can be very difficult.

Speaker:
00:29:26

Um,

Speaker:
00:29:27

fact that to free up that space takes time.

Speaker:
00:29:30

Because typically with a lot of these systems, there's a background process

Speaker:
00:29:33

typically called garbage collection,

Speaker:
00:29:35

which goes and now needs to free up all this data and that does take time to run.

Speaker:
00:29:40

Yeah, it is, it is a two stage process where you, you, you, um, flag that

Speaker:
00:29:46

block for deletion and then another

Speaker:
00:29:49

process that runs typically when backups aren't running.

Speaker:
00:29:52

Um, and you, you probably have to force the garbage collection process.

Speaker:
00:29:57

Um, so go, go ahead.

Speaker:
00:29:59

so I was just thinking as we were talking about the first time

Speaker:
00:30:03

that I heard about AI in storage,

Speaker:
00:30:07

and I think the first company that I can recall, and I'm sure there

Speaker:
00:30:11

were others, was actually nimble.

Speaker:
00:30:14

Storage and nimble.

Speaker:
00:30:16

What they did is their first product when they built they, so

Speaker:
00:30:19

they provided primary storage.

Speaker:
00:30:21

And their first product, they basically were like, Hey, we are optimized for sql.

Speaker:
00:30:27

We are optimized for VMware.

Speaker:
00:30:29

We are optimized for these different, and I was like, oh, that's pretty awesome.

Speaker:
00:30:32

They're doing it dynamically.

Speaker:
00:30:33

But I think at the time it was kind of a static thing where you

Speaker:
00:30:36

would say, Hey, I have VMware.

Speaker:
00:30:38

I'm writing into this data store.

Speaker:
00:30:41

And it would optimize its, and it would basically pick different

Speaker:
00:30:45

block sizes for deduplication

Speaker:
00:30:47

Right, right, right.

Speaker:
00:30:49

Yeah.

Speaker:
00:30:49

That's interesting.

Speaker:
00:30:50

The, the, the, I, I, I think div, going back to the thing

Speaker:
00:30:56

we were talking about of like.

Speaker:
00:30:59

Using AI to basically help me understand when do I need to order more storage?

Speaker:
00:31:05

It can, to the best of its ability.

Speaker:
00:31:07

It can actually look at all of the DDU rates, right?

Speaker:
00:31:11

At all of the at, at what?

Speaker:
00:31:13

It could look at the DDU rate of each individual backup, right?

Speaker:
00:31:17

You, you gave, you told me it's a backup this much and this is

Speaker:
00:31:19

how much, and so we can actually

Speaker:
00:31:20

run all those calculations and I can actually figure out.

Speaker:
00:31:24

Well in six months, based on if everything stays the

Speaker:
00:31:26

same in six months, you're gonna be

Speaker:
00:31:29

outta storage.

Speaker:
00:31:29

Speaker:
00:31:30

many vendors actually do.

Speaker:
00:31:32

Yeah.

Speaker:
00:31:32

Yeah.

Speaker:
00:31:33

Um, so the, the, um,

Speaker:
00:31:37

Because I think storage capacity is a little easier.

Speaker:
00:31:40

To predict, because like you said, you're not really changing things, right.

Speaker:
00:31:44

You know what your policy is.

Speaker:
00:31:45

You know what data's coming in, you know how long it's, you're keeping it,

Speaker:
00:31:49

you know what your deduplication rates are, you know how much it's filling up.

Speaker:
00:31:53

So I think it's a little easier than what we had talked about previously

Speaker:
00:31:57

where it's like, okay, now let me plan out my entire backup infrastructure

Speaker:
00:32:00

and start scheduling that.

Speaker:
00:32:01

Yeah.

Speaker:
00:32:02

Speaking of dedupe, can AI help dedupe itself?

Speaker:
00:32:07

Do you think that?

Speaker:
00:32:08

can.

Speaker:
00:32:10

So I think my biggest.

Speaker:
00:32:12

Challenge would be that to run AI requires compute

Speaker:
00:32:19

and usually backup.

Speaker:
00:32:20

You want to go as fast as you can,

Speaker:
00:32:23

Mm-hmm.

Speaker:
00:32:23

right?

Speaker:
00:32:24

And so I think there's that tension.

Speaker:
00:32:26

That exists between running as fast as you can versus introducing

Speaker:
00:32:31

something in the pipeline to that could potentially slow things down.

Speaker:
00:32:35

And you'd have to also ask at what cost, right?

Speaker:
00:32:39

Like, are you going to be saving, say 70% additional versus a traditional

Speaker:
00:32:44

algorithms, or is it gonna be much less

Speaker:
00:32:48

Yeah, I think ddu in, um, in the backup world, there, there, there

Speaker:
00:32:57

have been two main ways to do ddu, which has been, there has been

Speaker:
00:33:02

something that isn't really ddu, but

Speaker:
00:33:05

there were DDU products that called themselves DDU products that did this.

Speaker:
00:33:09

Uh, and that would be block level, um,

Speaker:
00:33:12

incremental, essentially.

Speaker:
00:33:14

Right?

Speaker:
00:33:14

Not

Speaker:
00:33:14

actually de-duping things against each other, but just.

Speaker:
00:33:17

Using technology to lower the additional new data that's

Speaker:
00:33:22

backed up from each workload.

Speaker:
00:33:24

But then the traditional ddu, the way it works for those that don't know

Speaker:
00:33:27

this, is that you slice it up, you slice everything up into what are

Speaker:
00:33:31

typically called shards or chunks.

Speaker:
00:33:33

You run some type of algorithm on it that gives you some type of thing.

Speaker:
00:33:37

Like, like

Speaker:
00:33:38

A fingerprint.

Speaker:
00:33:39

the original SHA two,

Speaker:
00:33:41

SHA 2 56.

Speaker:
00:33:42

And again, here the, the better the algorithm, um, the better the ddu,

Speaker:
00:33:47

but the better the algorithm, the more compute it takes going back to

Speaker:
00:33:50

your trade off thing.

Speaker:
00:33:52

And so, um, that's the way basically every chunk it's run through, you come

Speaker:
00:33:57

up with this alpha numeric string, that alpha numeric string is compared

Speaker:
00:34:01

with every other alpha numeric string.

Speaker:
00:34:03

I. Um, and then that's how you identify redundant data.

Speaker:
00:34:07

And one of the challenges you have with that method is that, uh, the data slides,

Speaker:
00:34:11

um, and so if you don't slice the data at exactly the same spot it, it's duplicate

Speaker:
00:34:16

data, but you don't, don't identify it.

Speaker:
00:34:19

The, there is a completely different way which, um, you

Speaker:
00:34:24

look at the way vast does things.

Speaker:
00:34:26

They do something completely different, right?

Speaker:
00:34:28

So they, they have an algorithm and, and I, I'm guessing they

Speaker:
00:34:31

use AI or ML to, to, do this.

Speaker:
00:34:34

They have an algorithm that, um, basically identifies data that

Speaker:
00:34:41

is probably redundant, right?

Speaker:
00:34:43

Um, that, that, so they, they've got two different ways to do de-dupe and I, so

Speaker:
00:34:48

there are potentially, again, potentially.

Speaker:
00:34:53

AI or ML could be used to identify a new way to identify duplicate

Speaker:
00:35:00

data that is maybe, maybe

Speaker:
00:35:02

more efficient from a compute and storage.

Speaker:
00:35:06

Like even if it was just more efficient from a compute standpoint,

Speaker:
00:35:09

but got the but got the same amount of dedupe, that would still

Speaker:
00:35:13

be great.

Speaker:
00:35:14

Um, but

Speaker:
00:35:16

potentially this is something

Speaker:
00:35:18

that I think, uh, AI could

Speaker:
00:35:19

and the one thing I did also want to comment on Curtis is, uh, going back to

Speaker:
00:35:24

your comment about, okay, if the data shifts, then now you have to make sure

Speaker:
00:35:27

that you're doing the right blocks, right?

Speaker:
00:35:30

Uh, this is where companies though have done sort of, uh, what you're

Speaker:
00:35:34

talking about is called fixed block.

Speaker:
00:35:35

Fixed block deduplication,

Speaker:
00:35:37

right?

Speaker:
00:35:38

There are

Speaker:
00:35:38

many vendors out there though, who do variable size.

Speaker:
00:35:41

Variable block, uh, deduplication, which allows it to vary such that if

Speaker:
00:35:46

you do get an offset right, because of some data change, it's still able to

Speaker:
00:35:51

dup everything else after that because

Speaker:
00:35:53

of how it's actually computing the chunks, the segments, right?

Speaker:
00:35:58

Each of

Speaker:
00:35:58

the blocks.

Speaker:
00:35:59

Yep.

Speaker:
00:36:00

Um, so, uh, so that, that's certainly an area where, where AI could potentially

Speaker:
00:36:06

help the, um, the next, do you think it could help with recovery testing?

Speaker:
00:36:13

Oh yeah, I would.

Speaker:
00:36:15

So one thing for C is like, most people probably don't

Speaker:
00:36:20

know how to write a DR plan,

Speaker:
00:36:22

Mm-hmm.

Speaker:
00:36:23

Mm-hmm.

Speaker:
00:36:24

right.

Speaker:
00:36:25

Um, I wonder if you took ai, like even, and I'm going back to the first

Speaker:
00:36:31

set, right, the large language models,

Speaker:
00:36:33

Yep.

Speaker:
00:36:34

So the thing we said we

Speaker:
00:36:35

weren't talking about, I think we're gonna talk about it here.

Speaker:
00:36:38

Yeah.

Speaker:
00:36:38

I think at least to start with, it's like, Hey, here's all my data.

Speaker:
00:36:42

Here's my applications.

Speaker:
00:36:43

Help me build a DR test plan.

Speaker:
00:36:46

Yeah,

Speaker:
00:36:47

I like that idea.

Speaker:
00:36:48

And

Speaker:
00:36:49

see what it pops out because, and it may not be perfect, and don't just

Speaker:
00:36:52

blindly trust what it provides, but use it as a starting point, right?

Speaker:
00:36:55

And then go use that.

Speaker:
00:36:56

Because I think a lot of people struggle with, where do I even start?

Speaker:
00:37:00

Yeah.

Speaker:
00:37:01

And you could also, um, you could use it like a chaos monkey,

Speaker:
00:37:06

right?

Speaker:
00:37:06

You could use it.

Speaker:
00:37:07

Help me come up with some interesting scenarios.

Speaker:
00:37:09

To just make the, the idea, you know, one of the things that we talked about with in

Speaker:
00:37:14

terms of, uh, cyber testing, uh, was, um.

Speaker:
00:37:18

You know, when we had Mike on the idea of like, doing this and, and

Speaker:
00:37:21

making it, making it fun, making it a game, uh, I like that idea a

Speaker:
00:37:25

lot and I think maybe AI could help

Speaker:
00:37:27

there.

Speaker:
00:37:27

Um,

Speaker:
00:37:28

if, if it helps you do recovery testing more often, um, and, uh,

Speaker:
00:37:33

helps you identify potential, uh, uh, plot, I was gonna say plot

Speaker:
00:37:37

holes, uh, potential, potential holes in your program, uh, then that, then that

Speaker:
00:37:43

I think could be, um, very

Speaker:
00:37:45

helpful.

Speaker:
00:37:45

And Curtis, since you threw out a term, Chaos Monkey is a tool that was released

Speaker:
00:37:50

by Netflix, and literally what it is used for is to just test it, resiliency.

Speaker:
00:37:56

So it'll go randomly, kill services, kill locations, kill

Speaker:
00:37:59

network connections, just to see.

Speaker:
00:38:02

Is streaming, interrupted, are, uh, end users having any sort of

Speaker:
00:38:06

issues and it's able to do this at a scale and in an automated fashion

Speaker:
00:38:11

versus someone like trying to think about all the combinations,

Speaker:
00:38:13

permutations, and scenarios, because they're probably gonna miss things.

Speaker:
00:38:17

And so Netflix designed this thing to actually go out and

Speaker:
00:38:20

test their infrastructure.

Speaker:
00:38:22

It is pretty impressive.

Speaker:
00:38:24

Uh, you know, their infrastructure in general is pretty impressive.

Speaker:
00:38:27

It's not flawless.

Speaker:
00:38:28

Um, I did, I did watch part of the, uh.

Speaker:
00:38:32

The Tyson fight a little while ago, and that was on Netflix

Speaker:
00:38:36

and it was not good, right?

Speaker:
00:38:39

That wasn't so much a resilient thing as it was.

Speaker:
00:38:42

They just, again, they could have used perhaps a little bit better

Speaker:
00:38:45

AI to predict the, what kind of load they were gonna have.

Speaker:
00:38:49

But yeah.

Speaker:
00:38:50

But the idea of predicting crazy things that will happen, uh, Netflix

Speaker:
00:38:56

is pretty darn resilient, uh, when it comes to their infrastructure,

Speaker:
00:38:59

Yep.

Speaker:
00:39:00

yeah, I, I like that idea a lot.

Speaker:
00:39:02

Um, and, and I think, I think this is something that could be, that,

Speaker:
00:39:05

that, that, again, an, uh, uh, an LLM could actually help with, right?

Speaker:
00:39:09

So, like I said, the thing that we said we weren't gonna talk about,

Speaker:
00:39:11

we could talk about it, right?

Speaker:
00:39:13

Um, and for those, if you've never used a chat, g PT or a Claude,

Speaker:
00:39:17

uh, I think it's very useful

Speaker:
00:39:19

here, right?

Speaker:
00:39:20

You, you could say, Hey, I, I'm this kind of company.

Speaker:
00:39:23

This is the type of company, you know, and I understand the,

Speaker:
00:39:26

the privacy concerns of what you

Speaker:
00:39:27

share with a chat g pt or a clot.

Speaker:
00:39:29

Uh, there, there are, by the way, there are on-prem versions that

Speaker:
00:39:32

you can run, uh, of these LLMs too, so that you can keep the

Speaker:
00:39:36

data to yourself.

Speaker:
00:39:38

But the, you have a conversation with it.

Speaker:
00:39:41

Here's the type of company I am, here's the type of computing environment I have.

Speaker:
00:39:45

What do you th what could go

Speaker:
00:39:46

wrong?

Speaker:
00:39:47

Um, you know what, what could I build a, a dr scenario

Speaker:
00:39:51

around?

Speaker:
00:39:51

Any final thoughts?

Speaker:
00:39:52

Can you think of, uh, any other areas where we could use AI and, and backup?

Speaker:
00:39:59

Not so much.

Speaker:
00:39:59

I think the one thing I do wanna call out though is AI is here to stay.

Speaker:
00:40:05

ML is here to stay.

Speaker:
00:40:06

Don't be afraid of it.

Speaker:
00:40:08

Use it.

Speaker:
00:40:09

Right in the right ways and don't be afraid and just start thinking about it.

Speaker:
00:40:14

Uh, the one other thing I will call out is as companies are starting

Speaker:
00:40:19

to dig into AI and ML for their own applications, production applications

Speaker:
00:40:24

and other things, as a backup admin, you need to start thinking

Speaker:
00:40:28

about how do I protect this, right?

Speaker:
00:40:30

How do I back it up?

Speaker:
00:40:31

How would I potentially restore it?

Speaker:
00:40:32

Because there's a lot of data and training these models.

Speaker:
00:40:36

Is really, really expensive.

Speaker:
00:40:39

Mm.

Speaker:
00:40:39

And so you wanna make sure you have mechanisms to protect the models

Speaker:
00:40:44

that emerge from all of this training so you can restore them if needed.

Speaker:
00:40:50

So use backup to, to make AI more resilient while AI makes backup more

Speaker:
00:40:55

resilient.

Speaker:
00:40:56

I like that.

Speaker:
00:40:57

We'll call that a symbiosis.

Speaker:
00:40:59

I like that a lot.

Speaker:
00:41:01

Uh, one my final thought is that potentially you could use, again,

Speaker:
00:41:05

going back to the thing we said we weren't gonna talk about.

Speaker:
00:41:08

You could use LLMs to help select vendors, right?

Speaker:
00:41:12

You could say, Hey, here are all my requirements and here's all the

Speaker:
00:41:15

documents that they, they gave me this 57 page response to my 10 page RFI.

Speaker:
00:41:21

Can you help me make sense of it?

Speaker:
00:41:23

Um, and, uh, you, you could use that again, trust but

Speaker:
00:41:26

verify when using an LLM for

Speaker:
00:41:28

sure.

Speaker:
00:41:30

All right, well, thanks again, Prasanna, uh, for a good chat.

Speaker:
00:41:34

Thank you, Curtis.

Speaker:
00:41:35

And I am not gonna change how I hold a coffee mug.

Speaker:
00:41:38

I'm sorry.

Speaker:
00:41:40

I, I would expect no less.

Speaker:
00:41:42

And thanks to our listeners, uh, we'd be nothing without you.

Speaker:
00:41:45

That is a wrap.

Artificial Intelligence: The Future of Backup?

Listen On

Recent Episodes

Ransomware Episodes

Backup to Basics Episodes

Cloud Recovery Episodes

Sponsored Episodes

Cybersecurity Episodes

Browse episodes by category