Check out our companion blog!
Feb. 3, 2025

Artificial Intelligence: The Future of Backup?

Artificial Intelligence: The Future of Backup?

Artificial intelligence in backup isn't just marketing hype - it's changing how we protect our data. In this episode, W. Curtis Preston and Prasanna Malaiyandi break down the practical applications of AI in backup systems, from intelligent scheduling to ransomware detection.

Learn how artificial intelligence helps with capacity planning, especially with deduplication systems where predicting storage needs gets tricky. We discuss AI's role in asset discovery, anomaly detection, and even creating better disaster recovery plans. Plus, find out why backing up AI models themselves might become your next big challenge. This no-nonsense look at AI in backup cuts through the confusion and focuses on what really matters - making your backups better.

Transcript
Speaker:

You've found the backup wrap up your go-to podcast for all things



Speaker:

backup recovery and cyber recovery.



Speaker:

In this episode, we take a look at the use of artificial intelligence in backup.



Speaker:

Can AI make your backup environment actually better?



Speaker:

Prasanna Malaiyandi and I discuss AI and how it can help from



Speaker:

possibly everything from scheduling backups to detecting ransomware.



Speaker:

We talk about using it for deduplication, for capacity planning,



Speaker:

and even helping you to write better disaster recovery plans.



Speaker:

It's time to talk about AI and backups.



Speaker:

Hope you enjoy it.



Speaker:

By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.



Speaker:

Backup, and I've been passionate about backup and recovery for over 30 years.



Speaker:

Ever since I had to tell my boss I. That we had no backups of that really



Speaker:

important database that we had just lost.



Speaker:

I don't want that to happen to you, and that's why I do this podcast.



Speaker:

On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.



Speaker:

This is the backup wrap up.



Speaker:

Welcome to the show.



Speaker:

Hi, I'm w Curtis Preston, AKA, Mr. Backup, and I have with me a guy who apparently



Speaker:

doesn't know how to hold a coffee cup.



Speaker:

Prasanna Malaiyandi, how's it going?



Speaker:

Prasanna



Speaker:

I am good, Curtis.



Speaker:

I. So I think we need to clarify a



Speaker:

are you defending yourself?



Speaker:

Are you gonna try to defend your weirdness?



Speaker:

I think we have to talk about multiple things.



Speaker:

First.



Speaker:

In India, they don't typically use like a mug.



Speaker:

They use like a stainless steel cup, right?



Speaker:

So if



Speaker:

you're drinking hot beverages, you can only hold it from like the very,



Speaker:

you saw it when we went to the Indian



Speaker:

restaurant in San Diego,



Speaker:

yeah, yeah.



Speaker:

you have to hold it from the very top, otherwise you'll burn your hand.



Speaker:

Right.



Speaker:

And then most mugs, it just feels weird.



Speaker:

Like I got, I got chunky fingers, like sausages, right?



Speaker:

And so like putting it inside the mug, like the handle part of the mug.



Speaker:

I feel like, especially if it's like a curve, not like a straight, I feel like



Speaker:

there's not enough stability there.



Speaker:

That's fascinating.



Speaker:

So for people watching the video, who, by the



Speaker:

way, we do publish a video on YouTube if you want to see our



Speaker:

glorious faces and our expressions.



Speaker:

But yeah, so when I hold a mug, I don't hold it like this through the



Speaker:

handle.



Speaker:

I basically grab it either from the top or I hold it like on the side.



Speaker:

And then of course, the Pinky's kind



Speaker:

The pinky,



Speaker:

the bottom.



Speaker:

But what's weird though is the pinky supporting the bottom thing.



Speaker:

I know you've complained to me many times, but that's also how I hold my phone.



Speaker:

you end up covering your microphone.



Speaker:

always hold the phone and



Speaker:

then my Pinky's kind of on the bottom, and so it always blocks the microphone.



Speaker:

So Curtis is always



Speaker:

like, were you underwater?



Speaker:

Did you swallow your phone?



Speaker:

What's going on?



Speaker:

So regarding your defense from, you know, how they hold, do things in India.



Speaker:

What part of India were you born in?



Speaker:

Uh, just remind



Speaker:

Yeah, I was, uh, born in not India, but,



Speaker:

but at home.



Speaker:

Right.



Speaker:

But yeah,



Speaker:

you were, you were raised by people born in



Speaker:

India, and so you were, you were taught,



Speaker:

yeah.



Speaker:

And so actually I prefer, so even drinking water.



Speaker:

I don't drink from a glass cup.



Speaker:

I drink from a stainless steel cup.



Speaker:

Right.



Speaker:

Which is, if you haven't spent any time around, you know,



Speaker:

Indians, you wouldn't know that.



Speaker:

It's just that you use a lot, you use stainless steel for cups, for plates,



Speaker:

right.



Speaker:

As Curtis knows what I'm loading, the dishwasher and



Speaker:

he's like, what is that racket?



Speaker:

what is happening over there?



Speaker:

Because everything's so noisy.



Speaker:

They last longer and you don't have to worry about them breaking.



Speaker:

That's, you know, I can't, I can't complain.



Speaker:

Yeah.



Speaker:

Uh, but yeah, I don't get the whole knot, you know?



Speaker:

Here I am with four fingers in my mug.



Speaker:

I'm just saying.



Speaker:

Okay, so now what if that mug was smaller and the handle was curved,



Speaker:

Well, then that's like a, that's like a girly mug and then,



Speaker:

then you use two fingers like



Speaker:

this.



Speaker:

feel like it gives you enough stability?



Speaker:

And yet I've never dropped a mug.



Speaker:

I'm



Speaker:

just saying.



Speaker:

It's not from dropping the mug.



Speaker:

It's from like when you, yeah.



Speaker:

See when you're drinking it, it just feels like it's a little like,



Speaker:

Yeah.



Speaker:

Um,



Speaker:

all over you.



Speaker:

I just think you don't know how to hold a mic, but.



Speaker:

Our listeners are probably like, what are these people talking about?



Speaker:

By the way, this is a new format starting in the new year.



Speaker:

We are now gonna just be talking about coffee and all the crazy



Speaker:

things that Prasanna does.



Speaker:

Yeah, absolutely.



Speaker:

Um, or maybe we might actually talk about some stuff.



Speaker:

So I thought, um, you know, we've been seeing, uh, AI on the news a lot,



Speaker:

right?



Speaker:

ai, I've never heard about it.



Speaker:

Yeah, I've never, never heard of it.



Speaker:

Yeah.



Speaker:

So artificial intelligence, and if, if you've been following the backup



Speaker:

industry much, you probably saw a few announcements from your, uh, backup



Speaker:

company or maybe backup companies you're interested in about the use of ai.



Speaker:

Within backup.



Speaker:

And so I thought we'd talk about that a little bit,



Speaker:

um, in this episode, and



Speaker:

whether or not it has a use, right?



Speaker:

And can, just to clarify, I think when a lot of these backup vendors launched ai,



Speaker:

they were using AI for like the, not for the core product, right?



Speaker:

So they were using AI for their support agent, or to help answer questions, right?



Speaker:

Which I think we all understand, we all know about, but I think in this



Speaker:

episode, I think we should focus on like the core part of backup.



Speaker:

Yeah.



Speaker:

So, so let's talk a just a little bit about, you know,



Speaker:

what we mean when we say ai.



Speaker:

There are different categories of ai and then also there's machine learning, which



Speaker:

is very closely, and honestly, I, I, I,



Speaker:

you know, I think I could describe the difference between machine



Speaker:

learning and ai, but then there's something that, that.



Speaker:

Changes, you know, that, that messes me up when we talk about that.



Speaker:

Um, I'll just, for those of you that actually really know what AI



Speaker:

is and machine learning is, you're gonna be offended by something



Speaker:

I say during this episode.



Speaker:

I, I'll just tell you that.



Speaker:

But we're gonna use the terms almost interchangeably, but they're not.



Speaker:

Uh, but I do want distinguish between.



Speaker:

What is referred to as generative ai, right?



Speaker:

Which is a, you know, a large language model that is



Speaker:

going to create things there.



Speaker:

It's not ex nihilo, right?



Speaker:

It's not from, it's not from nothing.



Speaker:

It's it, it has to, it has to have been trained on a large data set.



Speaker:

But, those are the kinds of things that they're using,



Speaker:

like you talked about there.



Speaker:

Sup for support



Speaker:

models, right?



Speaker:

And, And,



Speaker:

just as examples of large language models, you might've heard about



Speaker:

meta's llama, lama three, Lama four, there's chat, GPT or open ais.



Speaker:

What is it?



Speaker:

OPT?



Speaker:

What,



Speaker:

the, the, actual model.



Speaker:

the



Speaker:

underlying model.



Speaker:

Oh, okay.



Speaker:

I, I, I would just, I would've just said chat, GPT.



Speaker:

'cause everybody knows what chat GPT



Speaker:

is, right?



Speaker:

I mean, you've got copilot, you've got, you've



Speaker:

got, Yeah, you, so you've got Claude from Anthropic.



Speaker:

Um, there are a lot of people, you know, um, confused the company with the product.



Speaker:

But, um, these are the, these are the ones that are grabbing



Speaker:

all the headlines, right?



Speaker:

They're also, they're also writing large bodies of texts.



Speaker:

They're helping people to write books.



Speaker:

They're helping people to do art.



Speaker:

That, and there's a lot of, um.



Speaker:

A lot of legal discussions around that, around the use of things like



Speaker:

the books that I've written as, um, you know, feeding into that and, um,



Speaker:

the, we're not talking about that,



Speaker:

right?



Speaker:

Um, we're not gonna talk about, Hey, um, chat GPT.



Speaker:

My restore didn't work.



Speaker:

Can you recreate all my documents?



Speaker:

Um, it's not,



Speaker:

there's not gonna be anything like that, at least not yet.



Speaker:

Um, the, um, we're gonna talk about how AI can be used to basically



Speaker:

enhance the core functionality.



Speaker:

I mean, you said this in way, a fewer words a few minutes ago,



Speaker:

but, uh, basically how it could be used to make backups better.



Speaker:

And I think a good chunk of this is really, like you said, more



Speaker:

around machine learning models,



Speaker:

right,



Speaker:

right,



Speaker:

large language models.



Speaker:

right.



Speaker:

So the, the first section we will just talk about how potentially just talk about



Speaker:

this is just sort of thoughts out loud.



Speaker:

I know that we have a lot of vendors that listen to the podcast.



Speaker:

We are.



Speaker:

Technically aimed at the, the people who actually use backup and



Speaker:

recovery, but I know a lot of vendors use the podcast, so feel free to



Speaker:

take this episode and run with it and



Speaker:

do stuff.



Speaker:

So I, I guess the first question would be, do we think that, uh, machine learning



Speaker:

can be used to help just to prove the efficiency of the backup process itself?



Speaker:

What do you think about



Speaker:

Oh, a thousand percent.



Speaker:

A billion percent, Curtis.



Speaker:

So I've never actually had to implement a backup system.



Speaker:

But you've done



Speaker:

tons of this, right?



Speaker:

And how do you go about just planning your backup, right?



Speaker:

How to back up an infrastructure, right?



Speaker:

It's like, just walk us through that, right?



Speaker:

And how many spreadsheets and all the rest that you have in



Speaker:

order to try to optimize these.



Speaker:

Yeah, I, I think about that a lot.



Speaker:

And, and, and, and, and the answer is gonna depend greatly on the



Speaker:

product that you're using, right?



Speaker:

You know, I, I can think of.



Speaker:

The traditional way is that you're going to create some kind of schedule, some



Speaker:

kind of, uh, automatic backup schedule.



Speaker:

Um, and you're going to do a, again, traditionally we'll



Speaker:

do three categories here.



Speaker:

Traditionally you've got some full backups and you're gonna do some



Speaker:

full backups every once in a while.



Speaker:

Um, and I was always a proponent if you had to do full backups, I was



Speaker:

always a proponent of doing those.



Speaker:

No.



Speaker:

More often than once a month.



Speaker:

Um, back in the days of tape, it was once a week because



Speaker:

it, was



Speaker:

complicated the restore process.



Speaker:

Yeah.



Speaker:

But, um, you know, doing it no more often than once a month, but depending on your



Speaker:

backup product, you might be able to, to



Speaker:

spread that out even over like three months.



Speaker:

And then you also want to schedule, if your backup product



Speaker:

is capable of doing it, you wanna schedule a cumulative incremental.



Speaker:

A differential, some products call it.



Speaker:

Um, and then of course the daily incremental.



Speaker:

Right.



Speaker:

So spreading



Speaker:

that all



Speaker:

for one application you're talking about,



Speaker:

E exactly.



Speaker:

You're doing this per application, per server.



Speaker:

Um, and, and you're trying to load balance things out because if you've



Speaker:

properly designed your system, it's probably not capable of doing a full



Speaker:

backup of your environment in one night.



Speaker:

Right.



Speaker:

Um, because that would just be really expensive, and then the rest of the



Speaker:

time it would go completely unused.



Speaker:

Right?



Speaker:

Um, so you, you buy it so that it's you, you size it so that it's big



Speaker:

enough to do a full backup over time.



Speaker:

And, um, you're right that, that, that scheduling that out is problematic, right?



Speaker:

Um, and you, you definitely could use, um, uh, AI



Speaker:

or ML to, to do that.



Speaker:

And even for the scheduling aspect.



Speaker:

So we talked about the applications, and then you were talking about sort



Speaker:

of that infrastructure piece, which is shared and you now have to worry



Speaker:

about it across all of these things.



Speaker:

And I'm sure you had these bonkers spreadsheets that you



Speaker:

were creating, trying to do this.



Speaker:

Did it stretch all the way to the moon and back, by the way?



Speaker:

Well, you know me for, it wasn't even a spreadsheet, it was just, uh, it, it was a



Speaker:

script.



Speaker:

Right.



Speaker:

I would, I would just script all this nonsense.



Speaker:

Right?



Speaker:

Um, but it, but it, the bigger the environment, the more.



Speaker:

That doing it programmatically made sense, right?



Speaker:

Um, and, and by the way, even if you have a more modern backup tool



Speaker:

that does incremental forever, there are many applications that



Speaker:

won't, that won't let you do



Speaker:

that.



Speaker:

Right?



Speaker:

I think of like database backups still need to be done every, you know, a full



Speaker:

backup every so often, and you have to schedule these out,



Speaker:

And that's the



Speaker:

second category.



Speaker:

'cause I know you talked about three categories.



Speaker:

Yeah.



Speaker:

Oh yeah.



Speaker:

Oh, well the three categories were, yes.



Speaker:

Uh, thank you.



Speaker:

I'm glad I have you here sometimes, you know.



Speaker:

Yeah.



Speaker:

So you have the, the, the old school full and incremental,



Speaker:

which old school is still current



Speaker:

school.



Speaker:

If we're talking about regular apps, then there's the forever incremental type.



Speaker:

Um, and you don't, you, you do have to worry about scheduling those,



Speaker:

but generally you just sort of tell 'em all to start at once and then



Speaker:

they queue and then it is not, it's, it's a lot simpler to do those.



Speaker:

I. But then the final category are ones that actually, um, and I



Speaker:

think the one that probably stands out the most here would be Rubrik,



Speaker:

right?



Speaker:

Rubrik doesn't let you schedule, um, that



Speaker:

stuff.



Speaker:

You tell it what your RTO



Speaker:

is and your RPO, and it just does the backups.



Speaker:

I mean, in fact, there are people that complain that you cannot, at least



Speaker:

last time I checked, you could not do.



Speaker:

a a manually scheduled backup if you wanted to tell it when to do stuff.



Speaker:

Um, I, I think this is probably the first use of some sort of machine learning



Speaker:

or artificial intelligence that I can think of with regards to scheduling.



Speaker:

Which, which I was also gonna chime in.



Speaker:

So the first two methods you talked about, right?



Speaker:

You're kind of statically doing this upfront, setting the schedules and



Speaker:

hoping that forever that it will be good,



Speaker:

Right.



Speaker:

You'll always be able to meet it, but say that there's an additional load or a



Speaker:

server goes down or something else, right.



Speaker:

There's no way to fine tune and adjust that,



Speaker:

Well, well, I, Well, there, I mean, there is, but there's



Speaker:

no way to automatically fine



Speaker:

tune and Yeah.



Speaker:

Yeah.



Speaker:

Right.



Speaker:

And so you're just like, okay, maybe it'll fail a couple times



Speaker:

and then I'll adjust the policies and then I'll be fine, but Right.



Speaker:

Versus something like an SLA based, which I, I actually have



Speaker:

looked at rubrics in the past,



Speaker:

and I find that very enticing because really in the end, you



Speaker:

care about what your RPO and RTO,



Speaker:

Yeah.



Speaker:

No one cares if you can back up.



Speaker:

They only care if you can restore.



Speaker:

the problem though is it's such a big paradigm shift for a lot of backup admins



Speaker:

that it's very difficult to understand because it's like when people move



Speaker:

from on-premises to the cloud and they were concerned because they're like,



Speaker:

I can't touch and feel my equipment.



Speaker:

Right.



Speaker:

It's not something I could actually do.



Speaker:

I think that's also the same challenges you get when you move



Speaker:

from sort of, uh, schedule-based backups to sort of SLA based backups.



Speaker:

Yeah, I, I liked, I liked the idea a lot.



Speaker:

I, I, I still, again, you know, if I was, if I was running rubric,



Speaker:

I would give people the ability to do a manual backup if they



Speaker:

wanted to.



Speaker:

But, but I do really like the idea of SLA driven backups,



Speaker:

because I like the idea of SLAs.



Speaker:

You know, we've talked about SLAs on here, and I like the idea of.



Speaker:

Knowing the back backups were being done often enough to meet my SLAs.



Speaker:

I



Speaker:

really liked that idea.



Speaker:

The one thing I think that is useful with these sort of approaches is



Speaker:

we've talked about the fact that like your environment doesn't say static.



Speaker:

Right.



Speaker:

So as you're adding new workloads, as things are changing, you don't



Speaker:

want to have to go recompute your entire spreadsheet or your



Speaker:

script H every single time.



Speaker:

So it's nice to have sort of these models that can automatically help fine tune and



Speaker:

optimize so you're not wasting your time because it's more than likely that you're



Speaker:

not gonna get it right the first time if you manually try to reset some of these



Speaker:

things.



Speaker:

And so having this automatic thing that constantly is



Speaker:

adjusting just seems amazing.



Speaker:

Yeah, it does.



Speaker:

And I, and outside of Rubrik, I'm not aware of any tools that do that.



Speaker:

Uh, but I, I think that this could certainly be a way where



Speaker:

they could use AI to do that.



Speaker:

Um, the.



Speaker:

And I, and I was thinking about, again, going back to it, it's been a



Speaker:

while since I've had to do this in a production environment, but the, the



Speaker:

the first thing that you have to find out is how big is everything, right?



Speaker:

How big is, is everything from a database perspective and



Speaker:

how, how long does it take?



Speaker:

'cause there's all these different, and that's the thing that nobody knows.



Speaker:

Right.



Speaker:

How big is your, how big is your data center?



Speaker:

And they're like, I don't know.



Speaker:

I don't know.



Speaker:

And so like, you have to do a full backup first



Speaker:

before you have any idea.



Speaker:

And not every server backs up at the same speed and all these different things.



Speaker:

So yeah, it it is a



Speaker:

complicated



Speaker:

and you may not be able to back up everything at the same



Speaker:

time because there might be



Speaker:

different hours, right?



Speaker:

That



Speaker:

a server is sort of offline or has less load that you can actually do it.



Speaker:

Yeah, so having some sort of AI or ml, um, figure that out sounds amazing.



Speaker:

Right?



Speaker:

Another area where I think that this could help is very, very closely related, and



Speaker:

that is, and, and some backup products do have this and that is making sure



Speaker:

that everything in my data center.



Speaker:

Is backed up in some



Speaker:

way, right?



Speaker:

Usually where you see this is an integration with like, um, uh,



Speaker:

VMware or, uh, AWS, et cetera, right?



Speaker:

Um, basically just connect to my entire, uh, you know, control



Speaker:

panel and then just look and make sure that everything is connected



Speaker:

to some type of policy to back it



Speaker:

up.



Speaker:

I, I think.



Speaker:

a default policy if anything is created, so at least everything



Speaker:

is protected, even though



Speaker:

it may not be protected with the right thing, but at least it's



Speaker:

being protected and you don't have to worry about these gaps.



Speaker:

I.



Speaker:

Exactly.



Speaker:

Exactly.



Speaker:

Um, and I, I think you do see this in a lot of backup products.



Speaker:

Usually again, it's with integration



Speaker:

with, uh, big things like VMware, HyperV, AWS, um,



Speaker:

you know, et cetera.



Speaker:

you need the companies, those vendors, to actually provide the APIs to be



Speaker:

able to do these sort of queries, and I think that's where there's kind



Speaker:

of a little bit of a tension there,



Speaker:

Yeah.



Speaker:

Yeah.



Speaker:

I mean, theoretically you could scour the data center, right?



Speaker:

Uh, looking for new computers.



Speaker:

Again, I, I know I mentioned this before, but you know, back



Speaker:

in the day we did that, right?



Speaker:

And back in the day we did that with Vizio.



Speaker:

Um, the, the vis, there used to be a very



Speaker:

expensive version of Vizio that would just literally crawl your data center.



Speaker:

And it used, uh, some very interesting technology.



Speaker:

Um, I forgot the, the name of this, but like, inmap



Speaker:

does this, where it, what it does is it sends a malformed packet.



Speaker:

It finds an IP address, it sends a malformed packet to that IP address



Speaker:

to see how it responds, and different things respond in different ways.



Speaker:

And that's how it, that's how it, um,



Speaker:

That



Speaker:

is crazy that they built that.



Speaker:

Yeah.



Speaker:

Yeah.



Speaker:

Um, and so you, you could theoretically do that, but a agreed, it's much easier



Speaker:

if you just have, everything's gonna be in VMware or AWS and then just talk to AWS.



Speaker:

Now again, going to VMware and AWS, there can be multiple virtual data centers.



Speaker:

There can be



Speaker:

multiple AWS accounts.



Speaker:

So you, you, you want to make sure that, that you have some way to, to



Speaker:

do that.



Speaker:

And I, and I do like that idea.



Speaker:

Shadow it.



Speaker:

Yeah, shadow it bad,



Speaker:

especially when it comes to backup.



Speaker:

Right.



Speaker:

Um, again, I'll tell a story from back in the day was the time that someone came to



Speaker:

me and they had, they were DBAs and they, they gave me a directory of a database.



Speaker:

They wanted me to restore.



Speaker:

Restore, and it was temp, um slash TMP on a, on a HP box.



Speaker:

And for those that don't know slash TMP on an HP box specifically, HPUX was in ram.



Speaker:

So when you rebooted it, temp went away.



Speaker:

And this, um,



Speaker:

this



Speaker:

it source code,



Speaker:

what I.



Speaker:

it?



Speaker:

Source code



Speaker:

It was source code.



Speaker:

Yeah.



Speaker:

And they were developing for months, like an entire team of



Speaker:

developers developing source code of this new application in temp.



Speaker:

And then we rebooted the server and they, and they came to me



Speaker:

and asked me to restore it.



Speaker:

And I was like, dude, we don't back up temp. I don't know



Speaker:

what you're talking about.



Speaker:

Like, and they're like, dude, this is really important,



Speaker:

like heads are gonna roll.



Speaker:

And I'm like, yeah, not mine.



Speaker:

Like everybody knows we don't back up temp.



Speaker:

Except for you, apparently.



Speaker:

Oh



Speaker:

Uh, so it's, I'm just, you know, it's really bad when you have



Speaker:

a functioning system and then it's not being backed up again.



Speaker:

Another story we used to have, um, we had a, a naming convention.



Speaker:

Ours was very boring.



Speaker:

Um, it, it was, it H-P-D-B-S-V-A, right?



Speaker:

HP database server A, and there was HB FS oh one, et



Speaker:

cetera, right?



Speaker:

And I remember, and I had this form that you had to fill out.



Speaker:

This was an actual piece of paper.



Speaker:

We did



Speaker:

not have web pages.



Speaker:

Right?



Speaker:

You had this form that you fill out and, and you had to, and, and it, it said



Speaker:

on there, simply filling out this form is not, does not meet the requirement.



Speaker:

You do not consider your system backed up until you have a signed form back from me.



Speaker:

Right?



Speaker:

And then one day somebody handed me a form and it said like.



Speaker:

They wanted, like me to back up H-P-D-B-S-V-M, right?



Speaker:

And I go, M that's interesting.



Speaker:

The last server I remember hearing about was H. So that means there's an I, A



Speaker:

J, A K, and an L out there somewhere.



Speaker:

hasn't been backed up.



Speaker:

That hasn't been backed up.



Speaker:

Yeah.



Speaker:

Um, so this idea of automatically



Speaker:

detecting servers and applications sounds like a great



Speaker:

idea.



Speaker:

And also not just VMs, but also detect, it would be really



Speaker:

nice if it detected the type of



Speaker:

VM and said, this appears to be a SQL instance.



Speaker:

We should back it up with the default SQL



Speaker:

policy.



Speaker:

That would be great.



Speaker:

So in addition to making things more efficient, um, there are some



Speaker:

other things we could do, uh, with AI that also would be interesting.



Speaker:

Uh, what do



Speaker:

you think is the, the first one?



Speaker:

No.



Speaker:

So I think one of the ones, and we've talked about it so much, so often,



Speaker:

and vendors are starting to do this, it's around anomaly detection and



Speaker:

it could be used in various fashion.



Speaker:

So one thing is like, Hey, by the way, this server, all of a sudden it's backing



Speaker:

up 10 times what it normally does.



Speaker:

Maybe this might indicate like a malware or ransomware on the system.



Speaker:

Right?



Speaker:

Um.



Speaker:

Or Hey, I've noticed that there's a bunch of data that's starting



Speaker:

to look like based on entropy.



Speaker:

That it's been encrypted, that doesn't look normal.



Speaker:

Okay, maybe I should go investigate it, right?



Speaker:

So, or it could even be security things like, Hey, you're logging



Speaker:

in from a different place than normal as a backup admin.



Speaker:

Is this the right thing or not?



Speaker:

Yeah.



Speaker:

And also very closely related to the stuff you said before was, uh,



Speaker:

are files where the file type based on the first few bytes of the file,



Speaker:

does not match the extension of the



Speaker:

file.



Speaker:

So it says it's a dot doc, but the first few bites of the file



Speaker:

show that it's an application, for



Speaker:

Sorry, one



Speaker:

Yeah, that's an interesting use case around, uh, the first few bites because



Speaker:

that could detect things that are being encrypted or other things that don't



Speaker:

make sense, or potentially even malware.



Speaker:

Right.



Speaker:

Yeah, it, uh, it's something we do, you know, my, uh, employee is S two



Speaker:

data and we do a lot of restores of old stuff, um, where we're pulling



Speaker:

data off of tape often for, um, I. For e-discovery purposes and lawsuit



Speaker:

purposes and, um, investigation purposes.



Speaker:

And one of the things that we do as we're pulling data, 'cause we



Speaker:

use a, a, a proprietary tool that we've written to restore data off



Speaker:

of most backups rather than use the built in tool for a lot of reasons.



Speaker:

Um, and this is one of them is that we check the file type against the file



Speaker:

contents and, uh, it can, it can also indicate.



Speaker:

Um, uh, subterfuge,



Speaker:

right?



Speaker:

Um, it can indicate somebody trying to hide something.



Speaker:

Um, but yeah, so anomaly detection, I think is a really big one.



Speaker:

Uh, right.



Speaker:

Definitely that this is a, this is a, you looks like you've got ransomware, right?



Speaker:

You need



Speaker:

to solve that.



Speaker:

That was probably the, the first big use of AI that I



Speaker:

remember, uh, in, in the backup world.



Speaker:

And I, I, I will say that if.



Speaker:

The way that you know, that you have ransomware is that your backup



Speaker:

product told you something is wrong, but, uh, but it, but it can



Speaker:

happen.



Speaker:

Right.



Speaker:

Um, another one that I'll talk, uh, that I'd bring up is, is data classification.



Speaker:

Again, I think that.



Speaker:

This is, this is probably a very simple one, but the



Speaker:

idea of like, looking at all the different data types and helping you to



Speaker:

understand what is in your environment.



Speaker:

This is not that new.



Speaker:

Um, but perhaps the AI use case could be helping you to identify trends,



Speaker:

um, and, and where the data's moving, where it's being created, where



Speaker:

it's being changed, uh, et cetera.



Speaker:

Um, and, and then, which is very closely related to my



Speaker:

other idea, which is predictive



Speaker:

analytics.



Speaker:

Right.



Speaker:

Um, again, going back to, uh, you know, back in the day,



Speaker:

one of the things I remember being the hardest to do is capacity prediction.



Speaker:

You



Speaker:

know, predicting whether or not I have enough capacity To



Speaker:

do my backups for the next six



Speaker:

and you know what makes it even harder?



Speaker:

What's that?



Speaker:

It does, d ddu makes it way harder.



Speaker:

And you know what AI right?



Speaker:

Ai ml could, could use to, could be used because it's smarter than I am.



Speaker:

Smarter than you are.



Speaker:

It could actually understand the trends



Speaker:

as to now what, what, let's talk about that Non, not every,



Speaker:

everybody might not understand.



Speaker:

Why DDU makes capacity,



Speaker:

Sure.



Speaker:

uh, management so



Speaker:

So let's talk about the, before we get to D Dub, let's talk about like



Speaker:

traditional storage or tape, right?



Speaker:

So



Speaker:

you're doing a full backup, you know how big your database is, therefore,



Speaker:

you know, okay, my full backup is gonna take this much space and



Speaker:

you know, with compression, maybe it's gonna be two x or half the space, right?



Speaker:

And then, you know, okay, my daily change rate is say 5%, and based on the



Speaker:

total size, I know what that's gonna be.



Speaker:

And so



Speaker:

if I'm doing weekly fulls, daily incrementals, I know how much



Speaker:

storage I'm gonna need for a week.



Speaker:

Yeah.



Speaker:

And, and just as, and just as important, you also know how



Speaker:

much storage, when you delete



Speaker:

the, you know, the older backups.



Speaker:

Yeah.



Speaker:

You know how much storage will be freed up, which is just if, if not even more



Speaker:

important.



Speaker:

Now the problem with deduplication is they talk about these great rates like



Speaker:

40 x, 30 x, 20 x, take your pick, right?



Speaker:

And that's all great.



Speaker:

If you're all like if a lot of your data is very similar, but it's hard



Speaker:

to tell, is your data similar or not until you've actually start doing it.



Speaker:

So if you're trying to buy storage for, say, three years



Speaker:

ahead of time, a capacity plan.



Speaker:

It becomes really difficult.



Speaker:

And so you guess, right?



Speaker:

You'll take a stab and maybe you look at some of your data and you're like,



Speaker:

Hey, these kind of look the same, but you don't know if that's right or not



Speaker:

until you actually start backing it up.



Speaker:

And like you said, Curtis, if you go delete your backup, you may not



Speaker:

actually free up that space because it's been de-duplicated against something



Speaker:

else that you're still preserving.



Speaker:

right,



Speaker:

Say I go delete my backup for six months ago for one application.



Speaker:

Another application might have, uh, common blocks with that data or with that other



Speaker:

application.



Speaker:

And so even though I deleted the first application's backup,



Speaker:

it's not gonna free up space.



Speaker:

And so you end up with this problem and this challenge.



Speaker:

And that's one of the things, the hardest things about deduplication.



Speaker:

Having worked at a company that did deduplication, customers



Speaker:

always struggled with it,



Speaker:

Yeah,



Speaker:

And some of the



Speaker:

things we would do is we would be like, Hey, let's scan your



Speaker:

application and just understand what sort of DDU rates you may get.



Speaker:

And even that's a guess, because maybe you move an application from one storage



Speaker:

appliance to a different appliance and now your DDU rates are different.



Speaker:

Yeah.



Speaker:

And, and, and again, the



Speaker:

one of the most frustrating things could be if you, you start.



Speaker:

You're running outta capacity, right?



Speaker:

And so you say, listen, I know we said we wanted to keep backups for



Speaker:

three years, but we're running outta capacity and so we're gonna start



Speaker:

deleting three years minus a month.



Speaker:

And you do that and you get



Speaker:

back 0.1% of your, it can be very difficult.



Speaker:

Um,



Speaker:

fact that to free up that space takes time.



Speaker:

Because typically with a lot of these systems, there's a background process



Speaker:

typically called garbage collection,



Speaker:

which goes and now needs to free up all this data and that does take time to run.



Speaker:

Yeah, it is, it is a two stage process where you, you, you, um, flag that



Speaker:

block for deletion and then another



Speaker:

process that runs typically when backups aren't running.



Speaker:

Um, and you, you probably have to force the garbage collection process.



Speaker:

Um, so go, go ahead.



Speaker:

so I was just thinking as we were talking about the first time



Speaker:

that I heard about AI in storage,



Speaker:

and I think the first company that I can recall, and I'm sure there



Speaker:

were others, was actually nimble.



Speaker:

Storage and nimble.



Speaker:

What they did is their first product when they built they, so



Speaker:

they provided primary storage.



Speaker:

And their first product, they basically were like, Hey, we are optimized for sql.



Speaker:

We are optimized for VMware.



Speaker:

We are optimized for these different, and I was like, oh, that's pretty awesome.



Speaker:

They're doing it dynamically.



Speaker:

But I think at the time it was kind of a static thing where you



Speaker:

would say, Hey, I have VMware.



Speaker:

I'm writing into this data store.



Speaker:

And it would optimize its, and it would basically pick different



Speaker:

block sizes for deduplication



Speaker:

Right, right, right.



Speaker:

Yeah.



Speaker:

That's interesting.



Speaker:

The, the, the, I, I, I think div, going back to the thing



Speaker:

we were talking about of like.



Speaker:

Using AI to basically help me understand when do I need to order more storage?



Speaker:

It can, to the best of its ability.



Speaker:

It can actually look at all of the DDU rates, right?



Speaker:

At all of the at, at what?



Speaker:

It could look at the DDU rate of each individual backup, right?



Speaker:

You, you gave, you told me it's a backup this much and this is



Speaker:

how much, and so we can actually



Speaker:

run all those calculations and I can actually figure out.



Speaker:

Well in six months, based on if everything stays the



Speaker:

same in six months, you're gonna be



Speaker:

outta storage.



Speaker:

So



Speaker:

many vendors actually do.



Speaker:

Yeah.



Speaker:

Yeah.



Speaker:

Um, so the, the, um,



Speaker:

Because I think storage capacity is a little easier.



Speaker:

To predict, because like you said, you're not really changing things, right.



Speaker:

You know what your policy is.



Speaker:

You know what data's coming in, you know how long it's, you're keeping it,



Speaker:

you know what your deduplication rates are, you know how much it's filling up.



Speaker:

So I think it's a little easier than what we had talked about previously



Speaker:

where it's like, okay, now let me plan out my entire backup infrastructure



Speaker:

and start scheduling that.



Speaker:

Yeah.



Speaker:

Speaking of dedupe, can AI help dedupe itself?



Speaker:

Do you think that?



Speaker:

can.



Speaker:

So I think my biggest.



Speaker:

Challenge would be that to run AI requires compute



Speaker:

and usually backup.



Speaker:

You want to go as fast as you can,



Speaker:

Mm-hmm.



Speaker:

right?



Speaker:

And so I think there's that tension.



Speaker:

That exists between running as fast as you can versus introducing



Speaker:

something in the pipeline to that could potentially slow things down.



Speaker:

And you'd have to also ask at what cost, right?



Speaker:

Like, are you going to be saving, say 70% additional versus a traditional



Speaker:

algorithms, or is it gonna be much less



Speaker:

Yeah, I think ddu in, um, in the backup world, there, there, there



Speaker:

have been two main ways to do ddu, which has been, there has been



Speaker:

something that isn't really ddu, but



Speaker:

there were DDU products that called themselves DDU products that did this.



Speaker:

Uh, and that would be block level, um,



Speaker:

incremental, essentially.



Speaker:

Right?



Speaker:

Not



Speaker:

actually de-duping things against each other, but just.



Speaker:

Using technology to lower the additional new data that's



Speaker:

backed up from each workload.



Speaker:

But then the traditional ddu, the way it works for those that don't know



Speaker:

this, is that you slice it up, you slice everything up into what are



Speaker:

typically called shards or chunks.



Speaker:

You run some type of algorithm on it that gives you some type of thing.



Speaker:

Like, like



Speaker:

A fingerprint.



Speaker:

the original SHA two,



Speaker:

SHA 2 56.



Speaker:

And again, here the, the better the algorithm, um, the better the ddu,



Speaker:

but the better the algorithm, the more compute it takes going back to



Speaker:

your trade off thing.



Speaker:

And so, um, that's the way basically every chunk it's run through, you come



Speaker:

up with this alpha numeric string, that alpha numeric string is compared



Speaker:

with every other alpha numeric string.



Speaker:

I. Um, and then that's how you identify redundant data.



Speaker:

And one of the challenges you have with that method is that, uh, the data slides,



Speaker:

um, and so if you don't slice the data at exactly the same spot it, it's duplicate



Speaker:

data, but you don't, don't identify it.



Speaker:

The, there is a completely different way which, um, you



Speaker:

look at the way vast does things.



Speaker:

They do something completely different, right?



Speaker:

So they, they have an algorithm and, and I, I'm guessing they



Speaker:

use AI or ML to, to, do this.



Speaker:

They have an algorithm that, um, basically identifies data that



Speaker:

is probably redundant, right?



Speaker:

Um, that, that, so they, they've got two different ways to do de-dupe and I, so



Speaker:

there are potentially, again, potentially.



Speaker:

AI or ML could be used to identify a new way to identify duplicate



Speaker:

data that is maybe, maybe



Speaker:

more efficient from a compute and storage.



Speaker:

Like even if it was just more efficient from a compute standpoint,



Speaker:

but got the but got the same amount of dedupe, that would still



Speaker:

be great.



Speaker:

Um, but



Speaker:

potentially this is something



Speaker:

that I think, uh, AI could



Speaker:

and the one thing I did also want to comment on Curtis is, uh, going back to



Speaker:

your comment about, okay, if the data shifts, then now you have to make sure



Speaker:

that you're doing the right blocks, right?



Speaker:

Uh, this is where companies though have done sort of, uh, what you're



Speaker:

talking about is called fixed block.



Speaker:

Fixed block deduplication,



Speaker:

right?



Speaker:

There are



Speaker:

many vendors out there though, who do variable size.



Speaker:

Variable block, uh, deduplication, which allows it to vary such that if



Speaker:

you do get an offset right, because of some data change, it's still able to



Speaker:

dup everything else after that because



Speaker:

of how it's actually computing the chunks, the segments, right?



Speaker:

Each of



Speaker:

the blocks.



Speaker:

Yep.



Speaker:

Um, so, uh, so that, that's certainly an area where, where AI could potentially



Speaker:

help the, um, the next, do you think it could help with recovery testing?



Speaker:

Oh yeah, I would.



Speaker:

So one thing for C is like, most people probably don't



Speaker:

know how to write a DR plan,



Speaker:

Mm-hmm.



Speaker:

Mm-hmm.



Speaker:

right.



Speaker:

Um, I wonder if you took ai, like even, and I'm going back to the first



Speaker:

set, right, the large language models,



Speaker:

Yep.



Speaker:

So the thing we said we



Speaker:

weren't talking about, I think we're gonna talk about it here.



Speaker:

Yeah.



Speaker:

I think at least to start with, it's like, Hey, here's all my data.



Speaker:

Here's my applications.



Speaker:

Help me build a DR test plan.



Speaker:

Yeah,



Speaker:

I like that idea.



Speaker:

And



Speaker:

see what it pops out because, and it may not be perfect, and don't just



Speaker:

blindly trust what it provides, but use it as a starting point, right?



Speaker:

And then go use that.



Speaker:

Because I think a lot of people struggle with, where do I even start?



Speaker:

Yeah.



Speaker:

And you could also, um, you could use it like a chaos monkey,



Speaker:

right?



Speaker:

You could use it.



Speaker:

Help me come up with some interesting scenarios.



Speaker:

To just make the, the idea, you know, one of the things that we talked about with in



Speaker:

terms of, uh, cyber testing, uh, was, um.



Speaker:

You know, when we had Mike on the idea of like, doing this and, and



Speaker:

making it, making it fun, making it a game, uh, I like that idea a



Speaker:

lot and I think maybe AI could help



Speaker:

there.



Speaker:

Um,



Speaker:

if, if it helps you do recovery testing more often, um, and, uh,



Speaker:

helps you identify potential, uh, uh, plot, I was gonna say plot



Speaker:

holes, uh, potential, potential holes in your program, uh, then that, then that



Speaker:

I think could be, um, very



Speaker:

helpful.



Speaker:

And Curtis, since you threw out a term, Chaos Monkey is a tool that was released



Speaker:

by Netflix, and literally what it is used for is to just test it, resiliency.



Speaker:

So it'll go randomly, kill services, kill locations, kill



Speaker:

network connections, just to see.



Speaker:

Is streaming, interrupted, are, uh, end users having any sort of



Speaker:

issues and it's able to do this at a scale and in an automated fashion



Speaker:

versus someone like trying to think about all the combinations,



Speaker:

permutations, and scenarios, because they're probably gonna miss things.



Speaker:

And so Netflix designed this thing to actually go out and



Speaker:

test their infrastructure.



Speaker:

It is pretty impressive.



Speaker:

Uh, you know, their infrastructure in general is pretty impressive.



Speaker:

It's not flawless.



Speaker:

Um, I did, I did watch part of the, uh.



Speaker:

The Tyson fight a little while ago, and that was on Netflix



Speaker:

and it was not good, right?



Speaker:

That wasn't so much a resilient thing as it was.



Speaker:

They just, again, they could have used perhaps a little bit better



Speaker:

AI to predict the, what kind of load they were gonna have.



Speaker:

But yeah.



Speaker:

But the idea of predicting crazy things that will happen, uh, Netflix



Speaker:

is pretty darn resilient, uh, when it comes to their infrastructure,



Speaker:

Yep.



Speaker:

yeah, I, I like that idea a lot.



Speaker:

Um, and, and I think, I think this is something that could be, that,



Speaker:

that, that, again, an, uh, uh, an LLM could actually help with, right?



Speaker:

So, like I said, the thing that we said we weren't gonna talk about,



Speaker:

we could talk about it, right?



Speaker:

Um, and for those, if you've never used a chat, g PT or a Claude,



Speaker:

uh, I think it's very useful



Speaker:

here, right?



Speaker:

You, you could say, Hey, I, I'm this kind of company.



Speaker:

This is the type of company, you know, and I understand the,



Speaker:

the privacy concerns of what you



Speaker:

share with a chat g pt or a clot.



Speaker:

Uh, there, there are, by the way, there are on-prem versions that



Speaker:

you can run, uh, of these LLMs too, so that you can keep the



Speaker:

data to yourself.



Speaker:

But the, you have a conversation with it.



Speaker:

Here's the type of company I am, here's the type of computing environment I have.



Speaker:

What do you th what could go



Speaker:

wrong?



Speaker:

Um, you know what, what could I build a, a dr scenario



Speaker:

around?



Speaker:

Any final thoughts?



Speaker:

Can you think of, uh, any other areas where we could use AI and, and backup?



Speaker:

Not so much.



Speaker:

I think the one thing I do wanna call out though is AI is here to stay.



Speaker:

ML is here to stay.



Speaker:

Don't be afraid of it.



Speaker:

Use it.



Speaker:

Right in the right ways and don't be afraid and just start thinking about it.



Speaker:

Uh, the one other thing I will call out is as companies are starting



Speaker:

to dig into AI and ML for their own applications, production applications



Speaker:

and other things, as a backup admin, you need to start thinking



Speaker:

about how do I protect this, right?



Speaker:

How do I back it up?



Speaker:

How would I potentially restore it?



Speaker:

Because there's a lot of data and training these models.



Speaker:

Is really, really expensive.



Speaker:

Mm.



Speaker:

And so you wanna make sure you have mechanisms to protect the models



Speaker:

that emerge from all of this training so you can restore them if needed.



Speaker:

So use backup to, to make AI more resilient while AI makes backup more



Speaker:

resilient.



Speaker:

I like that.



Speaker:

We'll call that a symbiosis.



Speaker:

I like that a lot.



Speaker:

Uh, one my final thought is that potentially you could use, again,



Speaker:

going back to the thing we said we weren't gonna talk about.



Speaker:

You could use LLMs to help select vendors, right?



Speaker:

You could say, Hey, here are all my requirements and here's all the



Speaker:

documents that they, they gave me this 57 page response to my 10 page RFI.



Speaker:

Can you help me make sense of it?



Speaker:

Um, and, uh, you, you could use that again, trust but



Speaker:

verify when using an LLM for



Speaker:

sure.



Speaker:

All right, well, thanks again, Prasanna, uh, for a good chat.



Speaker:

Thank you, Curtis.



Speaker:

And I am not gonna change how I hold a coffee mug.



Speaker:

I'm sorry.



Speaker:

I, I would expect no less.



Speaker:

And thanks to our listeners, uh, we'd be nothing without you.



Speaker:

That is a wrap.