Artificial Intelligence: The Future of Backup?

Artificial intelligence in backup isn't just marketing hype - it's changing how we protect our data. In this episode, W. Curtis Preston and Prasanna Malaiyandi break down the practical applications of AI in backup systems, from intelligent scheduling to ransomware detection.
Learn how artificial intelligence helps with capacity planning, especially with deduplication systems where predicting storage needs gets tricky. We discuss AI's role in asset discovery, anomaly detection, and even creating better disaster recovery plans. Plus, find out why backing up AI models themselves might become your next big challenge. This no-nonsense look at AI in backup cuts through the confusion and focuses on what really matters - making your backups better.
You've found the backup wrap up your go-to podcast for all things
Speaker:
backup recovery and cyber recovery.
Speaker:
In this episode, we take a look at the use of artificial intelligence in backup.
Speaker:
Can AI make your backup environment actually better?
Speaker:
Prasanna Malaiyandi and I discuss AI and how it can help from
Speaker:
possibly everything from scheduling backups to detecting ransomware.
Speaker:
We talk about using it for deduplication, for capacity planning,
Speaker:
and even helping you to write better disaster recovery plans.
Speaker:
It's time to talk about AI and backups.
Speaker:
Hope you enjoy it.
Speaker:
By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.
Speaker:
Backup, and I've been passionate about backup and recovery for over 30 years.
Speaker:
Ever since I had to tell my boss I. That we had no backups of that really
Speaker:
important database that we had just lost.
Speaker:
I don't want that to happen to you, and that's why I do this podcast.
Speaker:
On this podcast, we turn unappreciated backup admins into Cyber Recovery Heroes.
Speaker:
This is the backup wrap up.
Speaker:
Welcome to the show.
Speaker:
Hi, I'm w Curtis Preston, AKA, Mr. Backup, and I have with me a guy who apparently
Speaker:
doesn't know how to hold a coffee cup.
Speaker:
Prasanna Malaiyandi, how's it going?
Speaker:
Prasanna
Speaker:
I am good, Curtis.
Speaker:
I. So I think we need to clarify a
Speaker:
are you defending yourself?
Speaker:
Are you gonna try to defend your weirdness?
Speaker:
I think we have to talk about multiple things.
Speaker:
First.
Speaker:
In India, they don't typically use like a mug.
Speaker:
They use like a stainless steel cup, right?
Speaker:
So if
Speaker:
you're drinking hot beverages, you can only hold it from like the very,
Speaker:
you saw it when we went to the Indian
Speaker:
restaurant in San Diego,
Speaker:
yeah, yeah.
Speaker:
you have to hold it from the very top, otherwise you'll burn your hand.
Speaker:
Right.
Speaker:
And then most mugs, it just feels weird.
Speaker:
Like I got, I got chunky fingers, like sausages, right?
Speaker:
And so like putting it inside the mug, like the handle part of the mug.
Speaker:
I feel like, especially if it's like a curve, not like a straight, I feel like
Speaker:
there's not enough stability there.
Speaker:
That's fascinating.
Speaker:
So for people watching the video, who, by the
Speaker:
way, we do publish a video on YouTube if you want to see our
Speaker:
glorious faces and our expressions.
Speaker:
But yeah, so when I hold a mug, I don't hold it like this through the
Speaker:
handle.
Speaker:
I basically grab it either from the top or I hold it like on the side.
Speaker:
And then of course, the Pinky's kind
Speaker:
The pinky,
Speaker:
the bottom.
Speaker:
But what's weird though is the pinky supporting the bottom thing.
Speaker:
I know you've complained to me many times, but that's also how I hold my phone.
Speaker:
you end up covering your microphone.
Speaker:
always hold the phone and
Speaker:
then my Pinky's kind of on the bottom, and so it always blocks the microphone.
Speaker:
So Curtis is always
Speaker:
like, were you underwater?
Speaker:
Did you swallow your phone?
Speaker:
What's going on?
Speaker:
So regarding your defense from, you know, how they hold, do things in India.
Speaker:
What part of India were you born in?
Speaker:
Uh, just remind
Speaker:
Yeah, I was, uh, born in not India, but,
Speaker:
but at home.
Speaker:
Right.
Speaker:
But yeah,
Speaker:
you were, you were raised by people born in
Speaker:
India, and so you were, you were taught,
Speaker:
yeah.
Speaker:
And so actually I prefer, so even drinking water.
Speaker:
I don't drink from a glass cup.
Speaker:
I drink from a stainless steel cup.
Speaker:
Right.
Speaker:
Which is, if you haven't spent any time around, you know,
Speaker:
Indians, you wouldn't know that.
Speaker:
It's just that you use a lot, you use stainless steel for cups, for plates,
Speaker:
right.
Speaker:
As Curtis knows what I'm loading, the dishwasher and
Speaker:
he's like, what is that racket?
Speaker:
what is happening over there?
Speaker:
Because everything's so noisy.
Speaker:
They last longer and you don't have to worry about them breaking.
Speaker:
That's, you know, I can't, I can't complain.
Speaker:
Yeah.
Speaker:
Uh, but yeah, I don't get the whole knot, you know?
Speaker:
Here I am with four fingers in my mug.
Speaker:
I'm just saying.
Speaker:
Okay, so now what if that mug was smaller and the handle was curved,
Speaker:
Well, then that's like a, that's like a girly mug and then,
Speaker:
then you use two fingers like
Speaker:
this.
Speaker:
feel like it gives you enough stability?
Speaker:
And yet I've never dropped a mug.
Speaker:
I'm
Speaker:
just saying.
Speaker:
It's not from dropping the mug.
Speaker:
It's from like when you, yeah.
Speaker:
See when you're drinking it, it just feels like it's a little like,
Speaker:
Yeah.
Speaker:
Um,
Speaker:
all over you.
Speaker:
I just think you don't know how to hold a mic, but.
Speaker:
Our listeners are probably like, what are these people talking about?
Speaker:
By the way, this is a new format starting in the new year.
Speaker:
We are now gonna just be talking about coffee and all the crazy
Speaker:
things that Prasanna does.
Speaker:
Yeah, absolutely.
Speaker:
Um, or maybe we might actually talk about some stuff.
Speaker:
So I thought, um, you know, we've been seeing, uh, AI on the news a lot,
Speaker:
right?
Speaker:
ai, I've never heard about it.
Speaker:
Yeah, I've never, never heard of it.
Speaker:
Yeah.
Speaker:
So artificial intelligence, and if, if you've been following the backup
Speaker:
industry much, you probably saw a few announcements from your, uh, backup
Speaker:
company or maybe backup companies you're interested in about the use of ai.
Speaker:
Within backup.
Speaker:
And so I thought we'd talk about that a little bit,
Speaker:
um, in this episode, and
Speaker:
whether or not it has a use, right?
Speaker:
And can, just to clarify, I think when a lot of these backup vendors launched ai,
Speaker:
they were using AI for like the, not for the core product, right?
Speaker:
So they were using AI for their support agent, or to help answer questions, right?
Speaker:
Which I think we all understand, we all know about, but I think in this
Speaker:
episode, I think we should focus on like the core part of backup.
Speaker:
Yeah.
Speaker:
So, so let's talk a just a little bit about, you know,
Speaker:
what we mean when we say ai.
Speaker:
There are different categories of ai and then also there's machine learning, which
Speaker:
is very closely, and honestly, I, I, I,
Speaker:
you know, I think I could describe the difference between machine
Speaker:
learning and ai, but then there's something that, that.
Speaker:
Changes, you know, that, that messes me up when we talk about that.
Speaker:
Um, I'll just, for those of you that actually really know what AI
Speaker:
is and machine learning is, you're gonna be offended by something
Speaker:
I say during this episode.
Speaker:
I, I'll just tell you that.
Speaker:
But we're gonna use the terms almost interchangeably, but they're not.
Speaker:
Uh, but I do want distinguish between.
Speaker:
What is referred to as generative ai, right?
Speaker:
Which is a, you know, a large language model that is
Speaker:
going to create things there.
Speaker:
It's not ex nihilo, right?
Speaker:
It's not from, it's not from nothing.
Speaker:
It's it, it has to, it has to have been trained on a large data set.
Speaker:
But, those are the kinds of things that they're using,
Speaker:
like you talked about there.
Speaker:
Sup for support
Speaker:
models, right?
Speaker:
And, And,
Speaker:
just as examples of large language models, you might've heard about
Speaker:
meta's llama, lama three, Lama four, there's chat, GPT or open ais.
Speaker:
What is it?
Speaker:
OPT?
Speaker:
What,
Speaker:
the, the, actual model.
Speaker:
the
Speaker:
underlying model.
Speaker:
Oh, okay.
Speaker:
I, I, I would just, I would've just said chat, GPT.
Speaker:
'cause everybody knows what chat GPT
Speaker:
is, right?
Speaker:
I mean, you've got copilot, you've got, you've
Speaker:
got, Yeah, you, so you've got Claude from Anthropic.
Speaker:
Um, there are a lot of people, you know, um, confused the company with the product.
Speaker:
But, um, these are the, these are the ones that are grabbing
Speaker:
all the headlines, right?
Speaker:
They're also, they're also writing large bodies of texts.
Speaker:
They're helping people to write books.
Speaker:
They're helping people to do art.
Speaker:
That, and there's a lot of, um.
Speaker:
A lot of legal discussions around that, around the use of things like
Speaker:
the books that I've written as, um, you know, feeding into that and, um,
Speaker:
the, we're not talking about that,
Speaker:
right?
Speaker:
Um, we're not gonna talk about, Hey, um, chat GPT.
Speaker:
My restore didn't work.
Speaker:
Can you recreate all my documents?
Speaker:
Um, it's not,
Speaker:
there's not gonna be anything like that, at least not yet.
Speaker:
Um, the, um, we're gonna talk about how AI can be used to basically
Speaker:
enhance the core functionality.
Speaker:
I mean, you said this in way, a fewer words a few minutes ago,
Speaker:
but, uh, basically how it could be used to make backups better.
Speaker:
And I think a good chunk of this is really, like you said, more
Speaker:
around machine learning models,
Speaker:
right,
Speaker:
right,
Speaker:
large language models.
Speaker:
right.
Speaker:
So the, the first section we will just talk about how potentially just talk about
Speaker:
this is just sort of thoughts out loud.
Speaker:
I know that we have a lot of vendors that listen to the podcast.
Speaker:
We are.
Speaker:
Technically aimed at the, the people who actually use backup and
Speaker:
recovery, but I know a lot of vendors use the podcast, so feel free to
Speaker:
take this episode and run with it and
Speaker:
do stuff.
Speaker:
So I, I guess the first question would be, do we think that, uh, machine learning
Speaker:
can be used to help just to prove the efficiency of the backup process itself?
Speaker:
What do you think about
Speaker:
Oh, a thousand percent.
Speaker:
A billion percent, Curtis.
Speaker:
So I've never actually had to implement a backup system.
Speaker:
But you've done
Speaker:
tons of this, right?
Speaker:
And how do you go about just planning your backup, right?
Speaker:
How to back up an infrastructure, right?
Speaker:
It's like, just walk us through that, right?
Speaker:
And how many spreadsheets and all the rest that you have in
Speaker:
order to try to optimize these.
Speaker:
Yeah, I, I think about that a lot.
Speaker:
And, and, and, and, and the answer is gonna depend greatly on the
Speaker:
product that you're using, right?
Speaker:
You know, I, I can think of.
Speaker:
The traditional way is that you're going to create some kind of schedule, some
Speaker:
kind of, uh, automatic backup schedule.
Speaker:
Um, and you're going to do a, again, traditionally we'll
Speaker:
do three categories here.
Speaker:
Traditionally you've got some full backups and you're gonna do some
Speaker:
full backups every once in a while.
Speaker:
Um, and I was always a proponent if you had to do full backups, I was
Speaker:
always a proponent of doing those.
Speaker:
No.
Speaker:
More often than once a month.
Speaker:
Um, back in the days of tape, it was once a week because
Speaker:
it, was
Speaker:
complicated the restore process.
Speaker:
Yeah.
Speaker:
But, um, you know, doing it no more often than once a month, but depending on your
Speaker:
backup product, you might be able to, to
Speaker:
spread that out even over like three months.
Speaker:
And then you also want to schedule, if your backup product
Speaker:
is capable of doing it, you wanna schedule a cumulative incremental.
Speaker:
A differential, some products call it.
Speaker:
Um, and then of course the daily incremental.
Speaker:
Right.
Speaker:
So spreading
Speaker:
that all
Speaker:
for one application you're talking about,
Speaker:
E exactly.
Speaker:
You're doing this per application, per server.
Speaker:
Um, and, and you're trying to load balance things out because if you've
Speaker:
properly designed your system, it's probably not capable of doing a full
Speaker:
backup of your environment in one night.
Speaker:
Right.
Speaker:
Um, because that would just be really expensive, and then the rest of the
Speaker:
time it would go completely unused.
Speaker:
Right?
Speaker:
Um, so you, you buy it so that it's you, you size it so that it's big
Speaker:
enough to do a full backup over time.
Speaker:
And, um, you're right that, that, that scheduling that out is problematic, right?
Speaker:
Um, and you, you definitely could use, um, uh, AI
Speaker:
or ML to, to do that.
Speaker:
And even for the scheduling aspect.
Speaker:
So we talked about the applications, and then you were talking about sort
Speaker:
of that infrastructure piece, which is shared and you now have to worry
Speaker:
about it across all of these things.
Speaker:
And I'm sure you had these bonkers spreadsheets that you
Speaker:
were creating, trying to do this.
Speaker:
Did it stretch all the way to the moon and back, by the way?
Speaker:
Well, you know me for, it wasn't even a spreadsheet, it was just, uh, it, it was a
Speaker:
script.
Speaker:
Right.
Speaker:
I would, I would just script all this nonsense.
Speaker:
Right?
Speaker:
Um, but it, but it, the bigger the environment, the more.
Speaker:
That doing it programmatically made sense, right?
Speaker:
Um, and, and by the way, even if you have a more modern backup tool
Speaker:
that does incremental forever, there are many applications that
Speaker:
won't, that won't let you do
Speaker:
that.
Speaker:
Right?
Speaker:
I think of like database backups still need to be done every, you know, a full
Speaker:
backup every so often, and you have to schedule these out,
Speaker:
And that's the
Speaker:
second category.
Speaker:
'cause I know you talked about three categories.
Speaker:
Yeah.
Speaker:
Oh yeah.
Speaker:
Oh, well the three categories were, yes.
Speaker:
Uh, thank you.
Speaker:
I'm glad I have you here sometimes, you know.
Speaker:
Yeah.
Speaker:
So you have the, the, the old school full and incremental,
Speaker:
which old school is still current
Speaker:
school.
Speaker:
If we're talking about regular apps, then there's the forever incremental type.
Speaker:
Um, and you don't, you, you do have to worry about scheduling those,
Speaker:
but generally you just sort of tell 'em all to start at once and then
Speaker:
they queue and then it is not, it's, it's a lot simpler to do those.
Speaker:
I. But then the final category are ones that actually, um, and I
Speaker:
think the one that probably stands out the most here would be Rubrik,
Speaker:
right?
Speaker:
Rubrik doesn't let you schedule, um, that
Speaker:
stuff.
Speaker:
You tell it what your RTO
Speaker:
is and your RPO, and it just does the backups.
Speaker:
I mean, in fact, there are people that complain that you cannot, at least
Speaker:
last time I checked, you could not do.
Speaker:
a a manually scheduled backup if you wanted to tell it when to do stuff.
Speaker:
Um, I, I think this is probably the first use of some sort of machine learning
Speaker:
or artificial intelligence that I can think of with regards to scheduling.
Speaker:
Which, which I was also gonna chime in.
Speaker:
So the first two methods you talked about, right?
Speaker:
You're kind of statically doing this upfront, setting the schedules and
Speaker:
hoping that forever that it will be good,
Speaker:
Right.
Speaker:
You'll always be able to meet it, but say that there's an additional load or a
Speaker:
server goes down or something else, right.
Speaker:
There's no way to fine tune and adjust that,
Speaker:
Well, well, I, Well, there, I mean, there is, but there's
Speaker:
no way to automatically fine
Speaker:
tune and Yeah.
Speaker:
Yeah.
Speaker:
Right.
Speaker:
And so you're just like, okay, maybe it'll fail a couple times
Speaker:
and then I'll adjust the policies and then I'll be fine, but Right.
Speaker:
Versus something like an SLA based, which I, I actually have
Speaker:
looked at rubrics in the past,
Speaker:
and I find that very enticing because really in the end, you
Speaker:
care about what your RPO and RTO,
Speaker:
Yeah.
Speaker:
No one cares if you can back up.
Speaker:
They only care if you can restore.
Speaker:
the problem though is it's such a big paradigm shift for a lot of backup admins
Speaker:
that it's very difficult to understand because it's like when people move
Speaker:
from on-premises to the cloud and they were concerned because they're like,
Speaker:
I can't touch and feel my equipment.
Speaker:
Right.
Speaker:
It's not something I could actually do.
Speaker:
I think that's also the same challenges you get when you move
Speaker:
from sort of, uh, schedule-based backups to sort of SLA based backups.
Speaker:
Yeah, I, I liked, I liked the idea a lot.
Speaker:
I, I, I still, again, you know, if I was, if I was running rubric,
Speaker:
I would give people the ability to do a manual backup if they
Speaker:
wanted to.
Speaker:
But, but I do really like the idea of SLA driven backups,
Speaker:
because I like the idea of SLAs.
Speaker:
You know, we've talked about SLAs on here, and I like the idea of.
Speaker:
Knowing the back backups were being done often enough to meet my SLAs.
Speaker:
I
Speaker:
really liked that idea.
Speaker:
The one thing I think that is useful with these sort of approaches is
Speaker:
we've talked about the fact that like your environment doesn't say static.
Speaker:
Right.
Speaker:
So as you're adding new workloads, as things are changing, you don't
Speaker:
want to have to go recompute your entire spreadsheet or your
Speaker:
script H every single time.
Speaker:
So it's nice to have sort of these models that can automatically help fine tune and
Speaker:
optimize so you're not wasting your time because it's more than likely that you're
Speaker:
not gonna get it right the first time if you manually try to reset some of these
Speaker:
things.
Speaker:
And so having this automatic thing that constantly is
Speaker:
adjusting just seems amazing.
Speaker:
Yeah, it does.
Speaker:
And I, and outside of Rubrik, I'm not aware of any tools that do that.
Speaker:
Uh, but I, I think that this could certainly be a way where
Speaker:
they could use AI to do that.
Speaker:
Um, the.
Speaker:
And I, and I was thinking about, again, going back to it, it's been a
Speaker:
while since I've had to do this in a production environment, but the, the
Speaker:
the first thing that you have to find out is how big is everything, right?
Speaker:
How big is, is everything from a database perspective and
Speaker:
how, how long does it take?
Speaker:
'cause there's all these different, and that's the thing that nobody knows.
Speaker:
Right.
Speaker:
How big is your, how big is your data center?
Speaker:
And they're like, I don't know.
Speaker:
I don't know.
Speaker:
And so like, you have to do a full backup first
Speaker:
before you have any idea.
Speaker:
And not every server backs up at the same speed and all these different things.
Speaker:
So yeah, it it is a
Speaker:
complicated
Speaker:
and you may not be able to back up everything at the same
Speaker:
time because there might be
Speaker:
different hours, right?
Speaker:
That
Speaker:
a server is sort of offline or has less load that you can actually do it.
Speaker:
Yeah, so having some sort of AI or ml, um, figure that out sounds amazing.
Speaker:
Right?
Speaker:
Another area where I think that this could help is very, very closely related, and
Speaker:
that is, and, and some backup products do have this and that is making sure
Speaker:
that everything in my data center.
Speaker:
Is backed up in some
Speaker:
way, right?
Speaker:
Usually where you see this is an integration with like, um, uh,
Speaker:
VMware or, uh, AWS, et cetera, right?
Speaker:
Um, basically just connect to my entire, uh, you know, control
Speaker:
panel and then just look and make sure that everything is connected
Speaker:
to some type of policy to back it
Speaker:
up.
Speaker:
I, I think.
Speaker:
a default policy if anything is created, so at least everything
Speaker:
is protected, even though
Speaker:
it may not be protected with the right thing, but at least it's
Speaker:
being protected and you don't have to worry about these gaps.
Speaker:
I.
Speaker:
Exactly.
Speaker:
Exactly.
Speaker:
Um, and I, I think you do see this in a lot of backup products.
Speaker:
Usually again, it's with integration
Speaker:
with, uh, big things like VMware, HyperV, AWS, um,
Speaker:
you know, et cetera.
Speaker:
you need the companies, those vendors, to actually provide the APIs to be
Speaker:
able to do these sort of queries, and I think that's where there's kind
Speaker:
of a little bit of a tension there,
Speaker:
Yeah.
Speaker:
Yeah.
Speaker:
I mean, theoretically you could scour the data center, right?
Speaker:
Uh, looking for new computers.
Speaker:
Again, I, I know I mentioned this before, but you know, back
Speaker:
in the day we did that, right?
Speaker:
And back in the day we did that with Vizio.
Speaker:
Um, the, the vis, there used to be a very
Speaker:
expensive version of Vizio that would just literally crawl your data center.
Speaker:
And it used, uh, some very interesting technology.
Speaker:
Um, I forgot the, the name of this, but like, inmap
Speaker:
does this, where it, what it does is it sends a malformed packet.
Speaker:
It finds an IP address, it sends a malformed packet to that IP address
Speaker:
to see how it responds, and different things respond in different ways.
Speaker:
And that's how it, that's how it, um,
Speaker:
That
Speaker:
is crazy that they built that.
Speaker:
Yeah.
Speaker:
Yeah.
Speaker:
Um, and so you, you could theoretically do that, but a agreed, it's much easier
Speaker:
if you just have, everything's gonna be in VMware or AWS and then just talk to AWS.
Speaker:
Now again, going to VMware and AWS, there can be multiple virtual data centers.
Speaker:
There can be
Speaker:
multiple AWS accounts.
Speaker:
So you, you, you want to make sure that, that you have some way to, to
Speaker:
do that.
Speaker:
And I, and I do like that idea.
Speaker:
Shadow it.
Speaker:
Yeah, shadow it bad,
Speaker:
especially when it comes to backup.
Speaker:
Right.
Speaker:
Um, again, I'll tell a story from back in the day was the time that someone came to
Speaker:
me and they had, they were DBAs and they, they gave me a directory of a database.
Speaker:
They wanted me to restore.
Speaker:
Restore, and it was temp, um slash TMP on a, on a HP box.
Speaker:
And for those that don't know slash TMP on an HP box specifically, HPUX was in ram.
Speaker:
So when you rebooted it, temp went away.
Speaker:
And this, um,
Speaker:
this
Speaker:
it source code,
Speaker:
what I.
Speaker:
it?
Speaker:
Source code
Speaker:
It was source code.
Speaker:
Yeah.
Speaker:
And they were developing for months, like an entire team of
Speaker:
developers developing source code of this new application in temp.
Speaker:
And then we rebooted the server and they, and they came to me
Speaker:
and asked me to restore it.
Speaker:
And I was like, dude, we don't back up temp. I don't know
Speaker:
what you're talking about.
Speaker:
Like, and they're like, dude, this is really important,
Speaker:
like heads are gonna roll.
Speaker:
And I'm like, yeah, not mine.
Speaker:
Like everybody knows we don't back up temp.
Speaker:
Except for you, apparently.
Speaker:
Oh
Speaker:
Uh, so it's, I'm just, you know, it's really bad when you have
Speaker:
a functioning system and then it's not being backed up again.
Speaker:
Another story we used to have, um, we had a, a naming convention.
Speaker:
Ours was very boring.
Speaker:
Um, it, it was, it H-P-D-B-S-V-A, right?
Speaker:
HP database server A, and there was HB FS oh one, et
Speaker:
cetera, right?
Speaker:
And I remember, and I had this form that you had to fill out.
Speaker:
This was an actual piece of paper.
Speaker:
We did
Speaker:
not have web pages.
Speaker:
Right?
Speaker:
You had this form that you fill out and, and you had to, and, and it, it said
Speaker:
on there, simply filling out this form is not, does not meet the requirement.
Speaker:
You do not consider your system backed up until you have a signed form back from me.
Speaker:
Right?
Speaker:
And then one day somebody handed me a form and it said like.
Speaker:
They wanted, like me to back up H-P-D-B-S-V-M, right?
Speaker:
And I go, M that's interesting.
Speaker:
The last server I remember hearing about was H. So that means there's an I, A
Speaker:
J, A K, and an L out there somewhere.
Speaker:
hasn't been backed up.
Speaker:
That hasn't been backed up.
Speaker:
Yeah.
Speaker:
Um, so this idea of automatically
Speaker:
detecting servers and applications sounds like a great
Speaker:
idea.
Speaker:
And also not just VMs, but also detect, it would be really
Speaker:
nice if it detected the type of
Speaker:
VM and said, this appears to be a SQL instance.
Speaker:
We should back it up with the default SQL
Speaker:
policy.
Speaker:
That would be great.
Speaker:
So in addition to making things more efficient, um, there are some
Speaker:
other things we could do, uh, with AI that also would be interesting.
Speaker:
Uh, what do
Speaker:
you think is the, the first one?
Speaker:
No.
Speaker:
So I think one of the ones, and we've talked about it so much, so often,
Speaker:
and vendors are starting to do this, it's around anomaly detection and
Speaker:
it could be used in various fashion.
Speaker:
So one thing is like, Hey, by the way, this server, all of a sudden it's backing
Speaker:
up 10 times what it normally does.
Speaker:
Maybe this might indicate like a malware or ransomware on the system.
Speaker:
Right?
Speaker:
Um.
Speaker:
Or Hey, I've noticed that there's a bunch of data that's starting
Speaker:
to look like based on entropy.
Speaker:
That it's been encrypted, that doesn't look normal.
Speaker:
Okay, maybe I should go investigate it, right?
Speaker:
So, or it could even be security things like, Hey, you're logging
Speaker:
in from a different place than normal as a backup admin.
Speaker:
Is this the right thing or not?
Speaker:
Yeah.
Speaker:
And also very closely related to the stuff you said before was, uh,
Speaker:
are files where the file type based on the first few bytes of the file,
Speaker:
does not match the extension of the
Speaker:
file.
Speaker:
So it says it's a dot doc, but the first few bites of the file
Speaker:
show that it's an application, for
Speaker:
Sorry, one
Speaker:
Yeah, that's an interesting use case around, uh, the first few bites because
Speaker:
that could detect things that are being encrypted or other things that don't
Speaker:
make sense, or potentially even malware.
Speaker:
Right.
Speaker:
Yeah, it, uh, it's something we do, you know, my, uh, employee is S two
Speaker:
data and we do a lot of restores of old stuff, um, where we're pulling
Speaker:
data off of tape often for, um, I. For e-discovery purposes and lawsuit
Speaker:
purposes and, um, investigation purposes.
Speaker:
And one of the things that we do as we're pulling data, 'cause we
Speaker:
use a, a, a proprietary tool that we've written to restore data off
Speaker:
of most backups rather than use the built in tool for a lot of reasons.
Speaker:
Um, and this is one of them is that we check the file type against the file
Speaker:
contents and, uh, it can, it can also indicate.
Speaker:
Um, uh, subterfuge,
Speaker:
right?
Speaker:
Um, it can indicate somebody trying to hide something.
Speaker:
Um, but yeah, so anomaly detection, I think is a really big one.
Speaker:
Uh, right.
Speaker:
Definitely that this is a, this is a, you looks like you've got ransomware, right?
Speaker:
You need
Speaker:
to solve that.
Speaker:
That was probably the, the first big use of AI that I
Speaker:
remember, uh, in, in the backup world.
Speaker:
And I, I, I will say that if.
Speaker:
The way that you know, that you have ransomware is that your backup
Speaker:
product told you something is wrong, but, uh, but it, but it can
Speaker:
happen.
Speaker:
Right.
Speaker:
Um, another one that I'll talk, uh, that I'd bring up is, is data classification.
Speaker:
Again, I think that.
Speaker:
This is, this is probably a very simple one, but the
Speaker:
idea of like, looking at all the different data types and helping you to
Speaker:
understand what is in your environment.
Speaker:
This is not that new.
Speaker:
Um, but perhaps the AI use case could be helping you to identify trends,
Speaker:
um, and, and where the data's moving, where it's being created, where
Speaker:
it's being changed, uh, et cetera.
Speaker:
Um, and, and then, which is very closely related to my
Speaker:
other idea, which is predictive
Speaker:
analytics.
Speaker:
Right.
Speaker:
Um, again, going back to, uh, you know, back in the day,
Speaker:
one of the things I remember being the hardest to do is capacity prediction.
Speaker:
You
Speaker:
know, predicting whether or not I have enough capacity To
Speaker:
do my backups for the next six
Speaker:
and you know what makes it even harder?
Speaker:
What's that?
Speaker:
It does, d ddu makes it way harder.
Speaker:
And you know what AI right?
Speaker:
Ai ml could, could use to, could be used because it's smarter than I am.
Speaker:
Smarter than you are.
Speaker:
It could actually understand the trends
Speaker:
as to now what, what, let's talk about that Non, not every,
Speaker:
everybody might not understand.
Speaker:
Why DDU makes capacity,
Speaker:
Sure.
Speaker:
uh, management so
Speaker:
So let's talk about the, before we get to D Dub, let's talk about like
Speaker:
traditional storage or tape, right?
Speaker:
So
Speaker:
you're doing a full backup, you know how big your database is, therefore,
Speaker:
you know, okay, my full backup is gonna take this much space and
Speaker:
you know, with compression, maybe it's gonna be two x or half the space, right?
Speaker:
And then, you know, okay, my daily change rate is say 5%, and based on the
Speaker:
total size, I know what that's gonna be.
Speaker:
And so
Speaker:
if I'm doing weekly fulls, daily incrementals, I know how much
Speaker:
storage I'm gonna need for a week.
Speaker:
Yeah.
Speaker:
And, and just as, and just as important, you also know how
Speaker:
much storage, when you delete
Speaker:
the, you know, the older backups.
Speaker:
Yeah.
Speaker:
You know how much storage will be freed up, which is just if, if not even more
Speaker:
important.
Speaker:
Now the problem with deduplication is they talk about these great rates like
Speaker:
40 x, 30 x, 20 x, take your pick, right?
Speaker:
And that's all great.
Speaker:
If you're all like if a lot of your data is very similar, but it's hard
Speaker:
to tell, is your data similar or not until you've actually start doing it.
Speaker:
So if you're trying to buy storage for, say, three years
Speaker:
ahead of time, a capacity plan.
Speaker:
It becomes really difficult.
Speaker:
And so you guess, right?
Speaker:
You'll take a stab and maybe you look at some of your data and you're like,
Speaker:
Hey, these kind of look the same, but you don't know if that's right or not
Speaker:
until you actually start backing it up.
Speaker:
And like you said, Curtis, if you go delete your backup, you may not
Speaker:
actually free up that space because it's been de-duplicated against something
Speaker:
else that you're still preserving.
Speaker:
right,
Speaker:
Say I go delete my backup for six months ago for one application.
Speaker:
Another application might have, uh, common blocks with that data or with that other
Speaker:
application.
Speaker:
And so even though I deleted the first application's backup,
Speaker:
it's not gonna free up space.
Speaker:
And so you end up with this problem and this challenge.
Speaker:
And that's one of the things, the hardest things about deduplication.
Speaker:
Having worked at a company that did deduplication, customers
Speaker:
always struggled with it,
Speaker:
Yeah,
Speaker:
And some of the
Speaker:
things we would do is we would be like, Hey, let's scan your
Speaker:
application and just understand what sort of DDU rates you may get.
Speaker:
And even that's a guess, because maybe you move an application from one storage
Speaker:
appliance to a different appliance and now your DDU rates are different.
Speaker:
Yeah.
Speaker:
And, and, and again, the
Speaker:
one of the most frustrating things could be if you, you start.
Speaker:
You're running outta capacity, right?
Speaker:
And so you say, listen, I know we said we wanted to keep backups for
Speaker:
three years, but we're running outta capacity and so we're gonna start
Speaker:
deleting three years minus a month.
Speaker:
And you do that and you get
Speaker:
back 0.1% of your, it can be very difficult.
Speaker:
Um,
Speaker:
fact that to free up that space takes time.
Speaker:
Because typically with a lot of these systems, there's a background process
Speaker:
typically called garbage collection,
Speaker:
which goes and now needs to free up all this data and that does take time to run.
Speaker:
Yeah, it is, it is a two stage process where you, you, you, um, flag that
Speaker:
block for deletion and then another
Speaker:
process that runs typically when backups aren't running.
Speaker:
Um, and you, you probably have to force the garbage collection process.
Speaker:
Um, so go, go ahead.
Speaker:
so I was just thinking as we were talking about the first time
Speaker:
that I heard about AI in storage,
Speaker:
and I think the first company that I can recall, and I'm sure there
Speaker:
were others, was actually nimble.
Speaker:
Storage and nimble.
Speaker:
What they did is their first product when they built they, so
Speaker:
they provided primary storage.
Speaker:
And their first product, they basically were like, Hey, we are optimized for sql.
Speaker:
We are optimized for VMware.
Speaker:
We are optimized for these different, and I was like, oh, that's pretty awesome.
Speaker:
They're doing it dynamically.
Speaker:
But I think at the time it was kind of a static thing where you
Speaker:
would say, Hey, I have VMware.
Speaker:
I'm writing into this data store.
Speaker:
And it would optimize its, and it would basically pick different
Speaker:
block sizes for deduplication
Speaker:
Right, right, right.
Speaker:
Yeah.
Speaker:
That's interesting.
Speaker:
The, the, the, I, I, I think div, going back to the thing
Speaker:
we were talking about of like.
Speaker:
Using AI to basically help me understand when do I need to order more storage?
Speaker:
It can, to the best of its ability.
Speaker:
It can actually look at all of the DDU rates, right?
Speaker:
At all of the at, at what?
Speaker:
It could look at the DDU rate of each individual backup, right?
Speaker:
You, you gave, you told me it's a backup this much and this is
Speaker:
how much, and so we can actually
Speaker:
run all those calculations and I can actually figure out.
Speaker:
Well in six months, based on if everything stays the
Speaker:
same in six months, you're gonna be
Speaker:
outta storage.
Speaker:
So
Speaker:
many vendors actually do.
Speaker:
Yeah.
Speaker:
Yeah.
Speaker:
Um, so the, the, um,
Speaker:
Because I think storage capacity is a little easier.
Speaker:
To predict, because like you said, you're not really changing things, right.
Speaker:
You know what your policy is.
Speaker:
You know what data's coming in, you know how long it's, you're keeping it,
Speaker:
you know what your deduplication rates are, you know how much it's filling up.
Speaker:
So I think it's a little easier than what we had talked about previously
Speaker:
where it's like, okay, now let me plan out my entire backup infrastructure
Speaker:
and start scheduling that.
Speaker:
Yeah.
Speaker:
Speaking of dedupe, can AI help dedupe itself?
Speaker:
Do you think that?
Speaker:
can.
Speaker:
So I think my biggest.
Speaker:
Challenge would be that to run AI requires compute
Speaker:
and usually backup.
Speaker:
You want to go as fast as you can,
Speaker:
Mm-hmm.
Speaker:
right?
Speaker:
And so I think there's that tension.
Speaker:
That exists between running as fast as you can versus introducing
Speaker:
something in the pipeline to that could potentially slow things down.
Speaker:
And you'd have to also ask at what cost, right?
Speaker:
Like, are you going to be saving, say 70% additional versus a traditional
Speaker:
algorithms, or is it gonna be much less
Speaker:
Yeah, I think ddu in, um, in the backup world, there, there, there
Speaker:
have been two main ways to do ddu, which has been, there has been
Speaker:
something that isn't really ddu, but
Speaker:
there were DDU products that called themselves DDU products that did this.
Speaker:
Uh, and that would be block level, um,
Speaker:
incremental, essentially.
Speaker:
Right?
Speaker:
Not
Speaker:
actually de-duping things against each other, but just.
Speaker:
Using technology to lower the additional new data that's
Speaker:
backed up from each workload.
Speaker:
But then the traditional ddu, the way it works for those that don't know
Speaker:
this, is that you slice it up, you slice everything up into what are
Speaker:
typically called shards or chunks.
Speaker:
You run some type of algorithm on it that gives you some type of thing.
Speaker:
Like, like
Speaker:
A fingerprint.
Speaker:
the original SHA two,
Speaker:
SHA 2 56.
Speaker:
And again, here the, the better the algorithm, um, the better the ddu,
Speaker:
but the better the algorithm, the more compute it takes going back to
Speaker:
your trade off thing.
Speaker:
And so, um, that's the way basically every chunk it's run through, you come
Speaker:
up with this alpha numeric string, that alpha numeric string is compared
Speaker:
with every other alpha numeric string.
Speaker:
I. Um, and then that's how you identify redundant data.
Speaker:
And one of the challenges you have with that method is that, uh, the data slides,
Speaker:
um, and so if you don't slice the data at exactly the same spot it, it's duplicate
Speaker:
data, but you don't, don't identify it.
Speaker:
The, there is a completely different way which, um, you
Speaker:
look at the way vast does things.
Speaker:
They do something completely different, right?
Speaker:
So they, they have an algorithm and, and I, I'm guessing they
Speaker:
use AI or ML to, to, do this.
Speaker:
They have an algorithm that, um, basically identifies data that
Speaker:
is probably redundant, right?
Speaker:
Um, that, that, so they, they've got two different ways to do de-dupe and I, so
Speaker:
there are potentially, again, potentially.
Speaker:
AI or ML could be used to identify a new way to identify duplicate
Speaker:
data that is maybe, maybe
Speaker:
more efficient from a compute and storage.
Speaker:
Like even if it was just more efficient from a compute standpoint,
Speaker:
but got the but got the same amount of dedupe, that would still
Speaker:
be great.
Speaker:
Um, but
Speaker:
potentially this is something
Speaker:
that I think, uh, AI could
Speaker:
and the one thing I did also want to comment on Curtis is, uh, going back to
Speaker:
your comment about, okay, if the data shifts, then now you have to make sure
Speaker:
that you're doing the right blocks, right?
Speaker:
Uh, this is where companies though have done sort of, uh, what you're
Speaker:
talking about is called fixed block.
Speaker:
Fixed block deduplication,
Speaker:
right?
Speaker:
There are
Speaker:
many vendors out there though, who do variable size.
Speaker:
Variable block, uh, deduplication, which allows it to vary such that if
Speaker:
you do get an offset right, because of some data change, it's still able to
Speaker:
dup everything else after that because
Speaker:
of how it's actually computing the chunks, the segments, right?
Speaker:
Each of
Speaker:
the blocks.
Speaker:
Yep.
Speaker:
Um, so, uh, so that, that's certainly an area where, where AI could potentially
Speaker:
help the, um, the next, do you think it could help with recovery testing?
Speaker:
Oh yeah, I would.
Speaker:
So one thing for C is like, most people probably don't
Speaker:
know how to write a DR plan,
Speaker:
Mm-hmm.
Speaker:
Mm-hmm.
Speaker:
right.
Speaker:
Um, I wonder if you took ai, like even, and I'm going back to the first
Speaker:
set, right, the large language models,
Speaker:
Yep.
Speaker:
So the thing we said we
Speaker:
weren't talking about, I think we're gonna talk about it here.
Speaker:
Yeah.
Speaker:
I think at least to start with, it's like, Hey, here's all my data.
Speaker:
Here's my applications.
Speaker:
Help me build a DR test plan.
Speaker:
Yeah,
Speaker:
I like that idea.
Speaker:
And
Speaker:
see what it pops out because, and it may not be perfect, and don't just
Speaker:
blindly trust what it provides, but use it as a starting point, right?
Speaker:
And then go use that.
Speaker:
Because I think a lot of people struggle with, where do I even start?
Speaker:
Yeah.
Speaker:
And you could also, um, you could use it like a chaos monkey,
Speaker:
right?
Speaker:
You could use it.
Speaker:
Help me come up with some interesting scenarios.
Speaker:
To just make the, the idea, you know, one of the things that we talked about with in
Speaker:
terms of, uh, cyber testing, uh, was, um.
Speaker:
You know, when we had Mike on the idea of like, doing this and, and
Speaker:
making it, making it fun, making it a game, uh, I like that idea a
Speaker:
lot and I think maybe AI could help
Speaker:
there.
Speaker:
Um,
Speaker:
if, if it helps you do recovery testing more often, um, and, uh,
Speaker:
helps you identify potential, uh, uh, plot, I was gonna say plot
Speaker:
holes, uh, potential, potential holes in your program, uh, then that, then that
Speaker:
I think could be, um, very
Speaker:
helpful.
Speaker:
And Curtis, since you threw out a term, Chaos Monkey is a tool that was released
Speaker:
by Netflix, and literally what it is used for is to just test it, resiliency.
Speaker:
So it'll go randomly, kill services, kill locations, kill
Speaker:
network connections, just to see.
Speaker:
Is streaming, interrupted, are, uh, end users having any sort of
Speaker:
issues and it's able to do this at a scale and in an automated fashion
Speaker:
versus someone like trying to think about all the combinations,
Speaker:
permutations, and scenarios, because they're probably gonna miss things.
Speaker:
And so Netflix designed this thing to actually go out and
Speaker:
test their infrastructure.
Speaker:
It is pretty impressive.
Speaker:
Uh, you know, their infrastructure in general is pretty impressive.
Speaker:
It's not flawless.
Speaker:
Um, I did, I did watch part of the, uh.
Speaker:
The Tyson fight a little while ago, and that was on Netflix
Speaker:
and it was not good, right?
Speaker:
That wasn't so much a resilient thing as it was.
Speaker:
They just, again, they could have used perhaps a little bit better
Speaker:
AI to predict the, what kind of load they were gonna have.
Speaker:
But yeah.
Speaker:
But the idea of predicting crazy things that will happen, uh, Netflix
Speaker:
is pretty darn resilient, uh, when it comes to their infrastructure,
Speaker:
Yep.
Speaker:
yeah, I, I like that idea a lot.
Speaker:
Um, and, and I think, I think this is something that could be, that,
Speaker:
that, that, again, an, uh, uh, an LLM could actually help with, right?
Speaker:
So, like I said, the thing that we said we weren't gonna talk about,
Speaker:
we could talk about it, right?
Speaker:
Um, and for those, if you've never used a chat, g PT or a Claude,
Speaker:
uh, I think it's very useful
Speaker:
here, right?
Speaker:
You, you could say, Hey, I, I'm this kind of company.
Speaker:
This is the type of company, you know, and I understand the,
Speaker:
the privacy concerns of what you
Speaker:
share with a chat g pt or a clot.
Speaker:
Uh, there, there are, by the way, there are on-prem versions that
Speaker:
you can run, uh, of these LLMs too, so that you can keep the
Speaker:
data to yourself.
Speaker:
But the, you have a conversation with it.
Speaker:
Here's the type of company I am, here's the type of computing environment I have.
Speaker:
What do you th what could go
Speaker:
wrong?
Speaker:
Um, you know what, what could I build a, a dr scenario
Speaker:
around?
Speaker:
Any final thoughts?
Speaker:
Can you think of, uh, any other areas where we could use AI and, and backup?
Speaker:
Not so much.
Speaker:
I think the one thing I do wanna call out though is AI is here to stay.
Speaker:
ML is here to stay.
Speaker:
Don't be afraid of it.
Speaker:
Use it.
Speaker:
Right in the right ways and don't be afraid and just start thinking about it.
Speaker:
Uh, the one other thing I will call out is as companies are starting
Speaker:
to dig into AI and ML for their own applications, production applications
Speaker:
and other things, as a backup admin, you need to start thinking
Speaker:
about how do I protect this, right?
Speaker:
How do I back it up?
Speaker:
How would I potentially restore it?
Speaker:
Because there's a lot of data and training these models.
Speaker:
Is really, really expensive.
Speaker:
Mm.
Speaker:
And so you wanna make sure you have mechanisms to protect the models
Speaker:
that emerge from all of this training so you can restore them if needed.
Speaker:
So use backup to, to make AI more resilient while AI makes backup more
Speaker:
resilient.
Speaker:
I like that.
Speaker:
We'll call that a symbiosis.
Speaker:
I like that a lot.
Speaker:
Uh, one my final thought is that potentially you could use, again,
Speaker:
going back to the thing we said we weren't gonna talk about.
Speaker:
You could use LLMs to help select vendors, right?
Speaker:
You could say, Hey, here are all my requirements and here's all the
Speaker:
documents that they, they gave me this 57 page response to my 10 page RFI.
Speaker:
Can you help me make sense of it?
Speaker:
Um, and, uh, you, you could use that again, trust but
Speaker:
verify when using an LLM for
Speaker:
sure.
Speaker:
All right, well, thanks again, Prasanna, uh, for a good chat.
Speaker:
Thank you, Curtis.
Speaker:
And I am not gonna change how I hold a coffee mug.
Speaker:
I'm sorry.
Speaker:
I, I would expect no less.
Speaker:
And thanks to our listeners, uh, we'd be nothing without you.
Speaker:
That is a wrap.