According to the 2020 IBM i Marketplace Survey Results, nearly 60 percent of IBM i shops plan to upgrade their hardware or software this year—but that’s not the only reason you might need to reevaluate your server capacity.
Perhaps you’re consolidating IBM i, AIX, and Linux VMs. Or possibly your boss requested an analysis on whether to run Linux on Power or Intel. Maybe you’re building a business case for the cloud or adopting GPU or another new technology and you’re not sure how it will impact capacity.
Don’t miss this webinar where our experts introduce you to Performance Navigator, the preferred tool for system sizing and capacity planning on IBM’s Power platform. Trusted by thousands of organizations and technology partners each year, you’ll learn how Performance Navigator can help you:
- Size your next POWER8 or POWER9 hardware purchase
- Determine internal and external disk configurations
- Interpret IBM’s Collection Services and/or NMON data
- Analyze difficult performance situations
- Evaluate VIOS, SAN, PowerHA, or other new technologies
System sizing barely scratches the surface of what Performance Navigator has to offer. Get the whole story here—directly from the capacity and performance analysis expert himself: Randy Watson.
Watch now!
A complete transcript of the webinar is below.
Tom Huntington:
Well, welcome everyone. We're very excited to be with you today. I'm Tom Huntington, I'll be your host and moderator today on System Sizing for IBM Power Systems Servers, it's almost like a little tongue twister in itself. We're going to talk about Performance Navigator today for capacity and performance analysis. This is a live webinar. We will be recording today's event, so you can play it back if you have to leave us early for some reason. We plan on about a full hour of activity. We're hoping for some Q&A from you, so lineup those questions, send them through the question toolbar. We'll have three polling questions, we hope that you'll participate in those two because it makes our life a whole lot more exciting when we have customer participation with those. The handouts will be available too. So the recording will be out there, handouts, it's all good. I'm joined by one of my colleagues and esteemed partner and a gentleman I've known for many years, Mr. Randy Watson. How are you today?
Randy Watson:
Oh, I am great. Looking forward to having this discussion today.
Tom Huntington:
Yeah, it's wonderful to have you with us. So Randy's been helping customers and partners for many years doing capacity and performance analysis. Myself, I've kind of always been on the outskirts of this topic, but great to have you with us, Randy. We have a fairly... might as well say moderate long agenda, but we have an hour of content easily to keep you all busy. We're going to talk a little bit about the buzz in the IBM i community, the benefits of proper planning and capacity consolidation, the fact that troubleshooting can be one of the issues that you're dealing with when it comes to performance, and what do you do? We'll talk a little bit about competitive analysis. There are business partners with us, or you yourself may be saying, "Do I put this on Power, or do I put this on Intel?"
Tom Huntington:
And then of course we're hearing more and more pressures around the topic of moving to the cloud. So those are our agenda items today. Let's talk a little bit about the buzz around the marketplace itself. If you don't know, HelpSystems does put out a couple of artifacts around the Power platform. One, we do the IBM i marketplace survey. We've done that for six years. Matter of fact, last week, we did a webinar on it and we also posted the PDF or the report itself on our website. That's a free un-gated download you can take advantage of. We've also, this fall, did the fourth annual AIX community survey. We grabbed those results here in the October, November time frame. We'll be doing a webinar releasing those results sometime around March or early April. We're working out those dates for you. And then we continue to do the state of the IBM i security study.
Tom Huntington:
What those studies show us is that figuring out capacity and managing the system isn't always that easy, and yet many of you are faced, as we look at the studies from last year and this year, with upgrades. On the IBM i marketplace study, we see about 61% of you are doing some form of upgrade, and specifically when it gets into hardware, that's where capacity planning is important to you. And then if we look over on the AIX side from last year's study, we talk about the reasons why maybe you haven't moved forward to the new POWER9 technology in that platform or that operating system and some of the issues facing you there. So we want to thank you for joining us again. As we look at technology hurdles, what are some of the things you see, Randy?
Randy Watson:
So I think a lot of... When you're thinking about capacity planning in general, a lot of people think, well, it's a simple CPW or rPerf map, but actually there's a lot of moving parts to figure. Ultimately, the goal is to figure out the best price to performance configuration from a hardware. There's a lot of questions here, so these are just a few of them. Are you going to do an on-prem solution or a cloud or a hybrid solution? There's two versions of the enterprise processor pools for those people who are enterprise machines. You really have to be PowerVM, aware because we're talking about desired and virtual settings and virtual shared pools and a lot of these things that tune the machines.
Randy Watson:
Obviously, are you going to go scale up or scale out, depending on your size of your system? Of course, storage is a big deal, internal or external. If you go external, how many options are there? You could say dozens, but certainly there's several options for external storage. There's many other options, lab partition mobility or DR or HA, you've got all those things, and as Tom mentioned, we're now monitoring GPUs if you have any AI projects going on.
Tom Huntington:
So a lot to worry about there. The other thing you should know is that Randy always talks about, turn that flight recorder on, or get that free host code running. What we're using is standard things on your IBM i a platform. We use the performance collection data. So if you're running that every five minutes or every 15 minutes, it's taken samples. That's the data we use for IBM i capacity and sizing, and we'll talk about why that is. And then in the AIX and Linux space, we're using Nmon data. So we can take Nmon data from Power or we can take Nmon on data actually from even Intel based systems and use that for sizing capacity planning.
Tom Huntington:
Then the other thing is, is how much data do you want to keep around, and how much data can you keep around? That's something we can help you out with too. We'll talk about how we can consolidate this information. And then there's benefits using this actual data. So if you think about we're using the collection data, we see how your application is behaving over time, not just this week, but the last month, the month before that, and maybe the last two years, and we can use that to help predict what you need into the future as you plan your business growth on the server.
Tom Huntington:
So management reporting is part of this problem determination. We have customers that come to us and say, "We're stuck. We don't know what's causing a problem here performance wise. Can you help us out?" We can answer that for you. And then of course, a big part of what we do is capacity playing. What's that next system? Or, I'm consolidate IBM i and AIX onto one frame. What does that look like? A lot of exercises like that that we help with. Don't worry, we can help you plan for those peaks and valleys as we look at this.
Tom Huntington:
So as I promised, we're going to do a few polling questions. This helps us better understand what's on your mind and what's you're looking at. So I'd really appreciate if you could answer this first question. When do you plan to upgrade to POWER9? Maybe you're not, maybe you're going to go right to POWER10, I don't know, whenever that comes out actually. I think that's somewhere in 2021 possibly, which is not that far off anymore. In the next month, in the next six months, by the end of 2019 in the next 12 plus months, we don't know. POWER9 is right for us. We saw, in this year's marketplace... Oh gosh, I'm trying to remember the number on it, but a good portion of the market already has gone to POWER9.
Tom Huntington:
And then I was very happy to see 73 as being, on IBM i anyways, the most popular operating system level that people are at. So we'll have you take a moment here to answer that poll. Looks pretty good. I'm going to close it out already here on you. It looks like we have nearly 10% of you in the next month, 10% in the next six months, 10% by the end of the year, 36% next 12 months, and then another 36 that don't know whether or not POWER9 is right for you. All right, let's share those results. Hopefully, I talked through them rather quickly, but those are the results from our poll. Let's move on to our next part of our presentation and talk about the benefits of proper planning.
Tom Huntington:
There's many reasons for capacity planning. One of them is business is good, the economy's going great. We need more CPW or more rPerf, if you're AIX, to support this growth. Maybe you had some unexpected growth, unexpected because you put up this new application and unfortunately it didn't perform as expected, or your organization went out and acquired or divested parts of the organization. So you have to plan for that unexpected growth or downsize that happened. The question of converting, especially in the IBM i space, from internal disk journal disk, you'll see in our marketplace survey that well over 40% of the market now is using external disk. Do I use SSD or spinning disk? What am I doing there?
Tom Huntington:
Server consolidation, as we talked about, we've seen a lot of consolidation in the IBM i space. I think there's a lot of room yet for sure in the AIX space. As we look at that AIX and Linux, Linux on Power and IBM i, we can mix these things together. What does that look like and who has that view for you? Are you doing it all on a spreadsheet or... Well, we'll show you there's a better way. So the other thing we can do is help you for that seasonal change, getting you ready for peak season. Of course, are you looking at the cloud? What do you need in the cloud? Do you just go to the cloud and use all of the CPW, all the rPerf, all the processes that you need and not worry about pricing at all? Well, we're pretty much sure that that's not the case.
Tom Huntington:
So Performance Navigator, a product that's been built over the last three decades, really, with Randy's experience and his team experience for doing performance capacity, performance analysis for IBM i, AIX and Linux. We've added, as we acquired Randy's group back in a year and a half ago now, hard to believe it's been that long. We've put this into our robots solution, just really... it's still Perf Nav, it hasn't changed any of the branding on that per se. But we do support it through our robot team and we do development and marketing through that group.
Tom Huntington:
So it's a graphical PC tool. So the big thing to think about is that this is a really... it's, I'm going to say, big GUI, I don't know if that's the right word for it, Randy, or not, but it has a lot of technology that's been written into the graphical interface. And then the collection piece runs independent of that, and we'll talk about that in a bit. We also can support other forms of Unix, so Solaris and HP certainly help people out with that, and we can look at Intel workload from Linux and say, what would that look like on Power? So at the end, we do performance analysis, performance management, capacity planning, and problem determinations. This is a solution that's been used around the world. So, Randy, what are these reports showing our customers?
Randy Watson:
One of the things to remind people is that even though, as we're going to talk about the second, the client is Windows client, but obviously the output is all mostly HTML, at least that's what customers are using... So the first graph on your left there is like an enterprise performance overview. So this is a monthly view of the world. So in today's virtualized world, when you have capacity and/or performance problems, oftentimes the solution might be, as I would say, robbing Peter to pay Paul. In other words, taking a resource from one petition and giving it to another that needs it more. So having that view from not only... in all operating systems.
Randy Watson:
So you can see this as a one view of every frame and every LPAR in a given enterprise, and you can see some of the color coded... These are red, yellow. If you're below a guideline, approaching a guideline or above a guideline, which certainly gives the admin people, okay, maybe I should go look at this and see what's going on. The graph on the right is a great example of what we do before and after analysis. This is getting into a problem determination. This is technically an example of a customer who decided to go to the cloud. This is disk response time. The graph on the left is their on-prem performance, the graph on the right was their cloud performance.
Randy Watson:
So it wasn't a CPU problem when they went to the cloud, it was a disk problem, and this customer actually went back on-prem because it didn't work. So they had to go... Went to the cloud, figured out it didn't work and they went back on-prem. So again, capacity planning is not just cores and so forth, there's CPU and memory issues that you have to worry about too.
Tom Huntington:
Right. So Randy, you talked about free host codes. What's the story behind that?
Randy Watson:
Well, I think this is one of the subtle things that most people don't grasp. The significance is Peter Drucker once said, "You can't manage anything you don't measure." Do we give this code away free, regardless if you're a customer or not, because we want you to have this history. This is the same for IBM i, AIX, Linux. It doesn't matter what operating system. For example, let's say by default in AIX, we're going to keep a year's worth of data. Because when it does come time to do a capacity plan or a problem determination, that's history isn't valuable. Oftentimes, I can't tell you how many times people call us up and they got a problem, we look at it and they say, "Yeah, this job ran an hour longer than it used to." Well, we don't have the picture of what it used to. We just have the picture after picture, we don't have the before picture. So it's like it's invaluable. It's real easy to install.
Randy Watson:
They're basically this... I'm going to say, there's no overhead from a CPU standpoint and very little disk space, and we have some numbers there for you. So it's really painless to do this, and there's all the value in the world to at least collect this. Even if you're not a customer, when we do an analysis for you, it certainly gives more credibility to the analysis.
Tom Huntington:
Awesome. So then what makes the Windows client software so important in this process? We talked yesterday a little bit about why it's not in a browser and maybe-
Randy Watson:
Right.
Tom Huntington:
... you can explain that.
Randy Watson:
Yes. As Tom mentioned, this product obviously has been developed over 30 years, and there's well over 100,000 lines of code in it. This Windows client does a lot of math. There is a lot of math that goes on because... For example, CPW is not in the data, you have to calculate those numbers, rPerf is not in the data, you have to calculate it. So there a lot of functionality in the cloud itself. But make sure everybody understands that the output, as you saw in that example, is mostly HTML or JPEG generated. So the output is meant for the web access and so forth, but the analysis is in this Windows client just because it's able to analyze thousands of LPARS, years of data.
Randy Watson:
Again, when you create enterprise reports... Again, a lot of people think about, "I want to look at my production system." Well, really, you don't buy LPARs, you buy a frame. A frame could have multiple LPARs in it. So you have to worry about the hardware, the performance, how many cores, how many frames. Obviously, you could have data centers analysis, memory, disk, both space and performance. So there's a lot of levels that this Windows client is designed to help you analyze.
Tom Huntington:
So what is what-if technology? What is that doing, Randy?
Randy Watson:
Tom, this is where all the magic happens. So there is this what-if function, right? Over the years, as Joe would say... Joe is the brains of the operation here. He used an expression this morning, as a matter of fact, he said, "We mix all this stuff up into applesauce, so to speak." So we take a Macintosh and this and that, and we take all these AIX and Linux and IBM i and multiple frames and multiple models and we throw it all in this thing and then we undo it to figure out based on any given different model. of course, what we're really trying to do, again, is find the best price-performance configuration, which ultimately is, okay, how many cores do I need? How do I configure those cores in terms of PowerVM, design and virtual? How do I balance workloads across multiple frames if I have those? Do I need virtual shared pools? How do I do internal disk or is it a spinning SSD?
Randy Watson:
So again, there's a lot of moving parts as we talked about at the beginning here, and that what-if function is what is going to give us the answer to those questions. So it is a collaborative effort between the customer, us doing the analysis, and the business partner, IBM, whoever's going to order the machine. Because ultimately, the goal is, here's the machine I want to go order, and this is how I configure it when I get the machine delivered.
Tom Huntington:
Well, your example earlier showing the customer that was on-prem going to the cloud is just a classic reason why you need this. If you-
Randy Watson:
Exactly.
Tom Huntington:
It's one of those things, if it sounds too good, it probably is, and we should really look at making sure that we're comparing apples to apples kind of thing as we move into the cloud. Right?
Randy Watson:
Absolutely.
Tom Huntington:
As you do this activity, what are some of the variables that you plug into it? What do you look at?
Randy Watson:
Well, the first thing we do is we bring all this stuff in, and obviously we want to figure out where you are today. So we have to understand. We don't use CPW or rPerf to do this. When we bring all this data in, we're basically using CPU milliseconds, because a millisecond is a millisecond. So obviously, one machine may be faster than another machine, and we take that obviously into consideration, which is what CPW and rPerf benchmarks do. But when we figure this stuff out as far as how to consolidate all this data, we're actually using milliseconds to do some of the math. So that really helps us. That's how we do AIX and Linux, because a millisecond is a millisecond and regardless of what operating system you are. And then we're going to do the math on that to figure out based on a given machine to try to figure out all the VMware, PowerVM settings and try to help you get the right number of course.
Randy Watson:
Because remember, hardware's one part of that price performance, but in a lot of cases, software is a bigger dollar amount than the hardware is. A lot of software is based on how many cores you have and how much memory you have and how many... So understanding all that can actually help you reduce the software dollars, because that's often a big component.
Tom Huntington:
Awesome. Okay. So now then we care about growth and... Oops, I didn't get that slide forward here. So when we look at the solution... I am-
Randy Watson:
One more.
Tom Huntington:
Okay. Oh, there we go.
Randy Watson:
There we go.
Tom Huntington:
I apologize, Randy. We look at growth, and we should care about growth, and what do you do different in this regards when somebody is saying, "Hey, my business is growing, I got to grow my system."
Randy Watson:
So just about every time, when we do a capacity plan, most customers are going to say, "I want a machine that lasts me three, four, or five years." That's normal and it's understandable. But of course, anybody who asks that question, the next question any analysts would ask if they don't know otherwise or if they don't have data is, what's your gross rate? Somebody's going to say, "Well, it's 10% or 20%." So they sort of, I'm going to say, make a number up.
Randy Watson:
Now, that number may not be made up, that number may be what their financial growth is. By the way, there is a correlation, but it's never linear. But what we've done to make the decision easier. We're doing this for 30 years, we turn that question around, as I like to say. Given a configuration, so let's say it's a 914 and you got four cores active, what is my room for growth? So by default, we start at 50%, which will put the peak point at 66% busy on a given configuration. So that's usually a 14% per year for three years, which is... a lot of people say somewhere between 10 to 20%. What we do is show the customer this, and the executives, that given this configuration, which costs X, this is how much room for growth you have.
Randy Watson:
If that's not enough, if they're not comfortable with that room for growth, then it's really a financial decision, then it's really easy. You start adding cores. Every core, obviously, is going to add more hardware dollars, potentially, and software dollars. So that way, it's a financial decision, it's really not a performance decision. So we came up with this room for growth a long time ago because it's more an accurate way to calculate the performance on a given configuration, and then it's just a decision of management to say, "Well, how much room for growth do I want?"
Tom Huntington:
Okay. What about when you say adjust workload by partition, what do you mean by that?
Randy Watson:
Remember you mentioned some of the reasons for capacity planning, and one of them was organic growth, obviously, or acquisition? So-
Tom Huntington:
Oh, sure.
Randy Watson:
... we've done... A lot of times... So bank A acquires bank B, and let's say they're both equal and they're going to convert the application from bank B to bank A's application and they're of the same size. So in that simple example, we could say, okay, we're going to take bank A's workload and we're going to double it. Right?
Tom Huntington:
Okay. Got it.
Randy Watson:
So if that's a percentage, so we can just adjust or we can take it away. Like you said, we're going to get rid of WebSphere, we're going to move WebSphere off of the eye and put it to Windows, which can happen. Whether it should or not is a different question, but can happen. We know what percentage of the workload was WebSphere, and we could subtract that to properly figure it out.
Tom Huntington:
Got it. So it is sometimes downsizing too and you're moving applications off. So what other variables do you see as important as you're doing these sizing activities?
Randy Watson:
Well, in today's virtualized world, obviously, you're really talking about server consolidation in most cases, even if it's a simple one or two, because most people have a prod and a test and a dev maybe. But today, more and more people, obviously, are going to external, and so that means you're going to have a couple of VIO servers. So you really are talking about consolidating just your existing workload. We still see a lot of the customers that have the IBM i people, and they have their machine and the AIX people and they have their machine. But from an enterprise point of view, you really should look at this holistically, because they could be... both those workloads can be running on the same frame-
Tom Huntington:
Sure.
Randy Watson:
... and it's probably more cost effective if you did it that way. Again, you mentioned Linux. All the new applications, including AI and machine learning and all that, all these new things are Linux-based, and a lot of those... I'm sure everybody on the call... there's probably some departments there that are firing up Linux petitions left and right. Well, in our humble opinion, Linux runs better on Power, and what we want to do is show people that, and we can actually collect that data, as you mentioned, on x86 and model that workload on how that would look like on a Power. So that's server consolidation.
Randy Watson:
Of course, we're talking about calculating the PowerVM settings, which is critical. So not only, how many cores do I need and what model and that kind of stuff, but how do I configure it, and what's the desired virtuals where that's capped or uncapped in the shared pool and that kind of thing.
Tom Huntington:
So as we talk about that, you deliver a blueprint to them. What are some of those deliverables of how the thing should be configured?
Randy Watson:
So after doing this for thousands and thousands of times, we do have a standard process we go through. Having said that, there are really no two capacity plannings alike. Tom and I were chatting before the meeting, this is what's fun about it in some way. But we do create this thing called the state of the union, which obviously in order to figure out where you're going, you have to figure out where you've been or where you are. This is a report... These are mostly PDFs. Technically, what's generated is HTML, but when we do the analysis, we're going to send you a PDF. But that's a state of the union about a given partition. And then we're going to do one or more what-if analysis. Even in simple cases, there could be, what if I got one core, two cores? What if I got an internal disk, external disk, a hybrid and so forth?
Randy Watson:
So there is one or more what-if graphs that are consolidated into a capacity planning report. Of course, we're going to do virtual... recommend any memory issues. Disk gets really... There's many options even in simple cases. Sometimes we do three or four options, and we can tell you when those options are. There's obviously several hours of detailed work that involved here, and we always come and present this result to you. So there is a deliverable, which we would then present to you and the management as we go through this or the IBM i or business partner if they're in the loop.
Tom Huntington:
So we're going to take a look at some of these examples of these deliverables, the enterprise hardware summary, the enterprise performance overview, the state of the union that Randy just talked about and then some of the what-if analysis. So Randy, what do we have here?
Randy Watson:
So this is the first thing we look at when we're doing an analysis, this is the enterprise hardware summary. I can't tell you how many times we've produced this report, and this is like brand new news to the CIO. Because this is a one page view for the executives of the entire enterprise. It doesn't matter if you have one frame or 100 frame. So in this case, we analyzed two frames. There's five LPARS plus one VIO server, which there are LPARS. We tell you what in the entire enterprise, how many CPW are installed and how many of it you have allocated. So obviously, don't confuse with having an installed capacity versus a licensed capacity. We do the same math for rPerf. We tell you how many cores were installed, how many of them are allocated, how many virtuals, how many VIOs, how much memory do you have, and the disk.
Randy Watson:
Again, we look at 6 partitions, there's 81 LANs, there's 7.2 terabytes of storage. I find this often interesting. You see that out of the 7.2, there's only 32.7% of it used. My guess is the CIO probably has an a requisition request on these disks for more storage. Obviously, from an enterprise, they're not using what they have now. The other thing that's really interesting is we tell you how many levels of the operating system. I think the biggest I've seen there was 17, 17 levels of operating system that cross-enterprise. Which is like, okay, we got to get our act together. I've seen 14 as system there.
Randy Watson:
This is a hardware summary that helps us understand that and helps us understand some of the PowerVM settings, so you can interpret the graphs that we produce. So you have to understand CPW, rPerf cores, virtual. You'll notice virtual shared pool IDs. That's important. Basically, for those who may not understand what that is, that is a mechanism within PowerVM to control software dollars, and you need to understand that and whether it's shared or in the cap and so forth. Memory disks, we know everything about how each of these partitions is configured. That helps us understand how to interpret the performance data.
Randy Watson:
And then we start a high level, as I mentioned in that first graph. Here's a another example of an enterprise performance overview. Again, it's usually by month or a month to date, and this shows you, okay, here's the average max and peak and what the trend is. So this is like a one page executive summary that you produce every month. Again, if you see some red or yellow, maybe it's time for the administration people to go take a look. This is one of the examples of the state of the union. In this case, what we did is oftentimes when we're doing an analysis, we scan all this historical data you send us, and to be conservative... This is subjective and it can be a collaborative decision with the customer and business partner and so forth. But we find the day that you use the most CPU milliseconds. So in this case, it had to be September the seventh.
Randy Watson:
This is a 24 hour picture of that machine from midnight to midnight, and we're going to use that, as it says at the top there, as a baseline for our what-if. So when we go into our what-if, we're going to take that workload and then we're going to model that to something else, either the existing machine or so forth. So that's one of the many... There's probably about 10 graphs in that state of the union. But here's an example of just one picture of the what-if. So this is our little machine plus a VIO server. So you can see by the two names there that this both machines, and this is a four core machine. So this is what the workload would look like on four core of, but obviously, we don't need four cores because it's only 11% busy. So now we're getting into, how many cores do I really need, which turns into, how many licenses do I need for software, and so forth.
Randy Watson:
And then we took the same thing, it says, okay, what if we... Remember that my room for growth, that default, I said 66% busy, 50% growth? So I said, tell me how many cores on a POWER9 924 I need for this partition to be the peak, to be at 66% busy, and the answers 0.2 desired, one virtual. Now, if that's enough, maybe you can run that higher, maybe you can run it lower. So this is where the discussion comes in, and that's obviously a knob you can turn, which we'll show you a little later on in the demo.
Tom Huntington:
Awesome. Thank you, Randy. Speaking of that, we're going to do a little live demo and take you into, what size system do I need in order to consolidate systems? Randy's going to show you some things here. So I'm going to make you a presenter, Randy, and-
Randy Watson:
You bet.
Tom Huntington:
... take it away.
Randy Watson:
Okay.
Tom Huntington:
You should have control now.
Randy Watson:
There we go.
Tom Huntington:
There you are.
Randy Watson:
So everybody is looking at my screen. This is the Windows interface. As you can see over here, I am connected to close to 1,000 LPARS, and this is because we do the planning all the time. It happens to be color coded. Blue is IBM i, black is Linux, green is AIX. The example I'm going to show you just to give you an is, I've highlighted those three systems. So again, remember when we consolidate servers, we can throw anything into this what-if analysis. So I've highlighted an IBM, a VIO server and an AIX partition. So basically, we're going to show you... I'm going to come in and do what-if new, and it's going to come in and says, "Hey, are you sure you want to do this?" And I'm going to say, "Yeah, yeah, I want to do this."
Tom Huntington:
So Randy, I want to let people know too that if your screen is really small, there is a box you can click on and it will make your screen bigger. Okay? For those watching.
Randy Watson:
Yeah, that's a great point there, Tom.
Tom Huntington:
All right.
Randy Watson:
Now, you can say it's bringing it in. Now, if I brought in 100 systems, this may take five or 10 minutes, but that's okay. So here we can say that I have brought in an IBM i, a VIO server in an AIX server. It happens to be running on the same frame, it happens to be a POWER8. I'm going to come over here and just show you real quickly that, how was it configured today? They all have to be... In this case, they're all running on the same frame. They could be on a different frame, it doesn't matter. But this is how they're configured today. This AIX's got 3.5 desired, four virtuals, one in one, one in one, and so forth.
Randy Watson:
So basically, the answer, just to show you real quick, what you do is you could come over and say, "I want to go to a POWER9. what machine do I want to go to?" By the way, I remember in the poll, like a third of you said that you don't know if POWER9 is right for you. Well, by the way, capacity planning often... I did one yesterday for a POWER7 customer. All they did is wanted to know how many cores they needed to activate. Or we do some models from a POWER7 to a POWER8. So it doesn't necessarily have to be to a POWER9, we can model anything. So then like a POWER19, ell, there's several versions. So you have to understand, well, there are limitations on some of these. But let's say we go here, and then you come back and then you can say, "Okay, tell me, based on this room for growth numbers..." Remember I talked about 50%? I'm going to come back over here and say, "Hey, just do the calculation."
Randy Watson:
So it's off doing it. It says, you can run the same workload at 1.38 cores. So again, how many cores do I need on this new machine? This is a six core machine. Will I really only need X or Y? So this is how we figure out, not only how many cores you need, but as you can see, what we do with the virtual is, we have the virtual shared pool ID, because this customer already uses this. So this is how you control the IBM i licensing. And then of course, we go into the DASD situation, and this customer happened be on a B7000 already, and they are already external. Of course, we know how many LANs they have and how big they are. So are they going to stay on this or are they going to upgrade the a flash system like a nine 9100 or 9200? That kind of thing. So this is just many of the options that we can have in the capacity planning function that we provide you back from an analysis standpoint.
Tom Huntington:
Awesome.
Randy Watson:
So that was just a real quick taste of a lot of the function. But again, you can see a lot of systems here. Now, from a a performance analysis, a lot of things happen underneath this power analytics. So this is more like what a customer would do. So management reporting. So this is how that enterprise hardware summary got produced, this is how... You just highlight whatever systems you want there and you click this. These are all scripts that we've written that automate the process. By the way, these can all be automated through the Windows Task Manager. So once you figure out which of these reports, and by the way, there's hundreds of them that management really wants, you really then automate this to produce these HTML reports on a daily, weekly, monthly basis, for example.
Tom Huntington:
Awesome. Okay. Thank you, Randy. I'm going to bring it back over to the presentation here. Let's see, I got to make sure I show the right screen here. Yeah, good. Okay. So as we answer that question, what size do I need in order to consolidate, thank you for demoing that. Let's talk about our next item, is troubleshooting performance problems. People don't realize behind the scenes how much we do help customers with this. So a customer is having an unexplainable performance problem, what do you do to help them, Randy? What do you do?
Randy Watson:
By the way, this is a... When we do capacity planning, I want to say more than half the time when a customer sees the first pass at the capacity planning, they see a peak, at 10 o'clock in the morning, some CPU went to 100% busy for an hour. Inevitably, they're going to say, "What caused it?" Which gets into problem determination. So they want to know why my CPU is this busy. Because a lot of people think, and maybe, they can control that. So in other words, if I can control when that CPU gets 100% busy, making that do that, I don't need as much hardware. I can make the system last longer or I can have less cores.
Randy Watson:
But because we have the flight recorder running and we use collection services and in that, all of that data is there. So not only does the client collect all this hardware data, but we track every job and every user, both historically, on a month to date number, for years, but we also track it at the interval levels, we're talking about the five minute or 15 minute interval for the last three months so that we can actually answer that before and after question. Because a lot of times we get customers saying, "Well, what is it?" Well, we can actually just drill down and tell you what job it was with two clicks of a button. Let's say it happened two months ago, because the date is there. Then we say, well, why did it do that?
Randy Watson:
So a lot of times, we get involved with a job that used to run an hour, and now it runs two hours. That's often the question. The first thing I want to do is verify that. So I use the historical data to in fact verify that that's true. By the way, sometimes it's not true. In other words-
Tom Huntington:
It's something else?
Randy Watson:
It's something else? So there's something else going on. The customer's impression is that it ran longer, but the actual, how long it was in the system... Because we know. We track runtime of every job historically, and we could tell you if it in fact doubled. If it did double, okay, we verify that. Now let's go figure out why. It's obviously CPU, or is there any hardware reason, because everybody's going to blame the hardware, that's typically what happens, for that job to re-run? What we want to do is prove or disprove that theory. By the way, historically, I will tell you, about 70% of all performance problems are application related.
Tom Huntington:
What?
Randy Watson:
Not hardware related. But 100% of the time, we as admin people have to prove that it's not the hardware or it is the hardware.
Tom Huntington:
But I think it comes down to the key, is you got to have the flight recorder turned on. Right?
Randy Watson:
If you don't have that data, you are hamstrung. In other words, if you don't have, you said yep, it did run, but I can't tell you what's the difference between when it ran an hour and when it ran two hours. I just don't have the data.
Tom Huntington:
Do you run into systems where people aren't running this?
Randy Watson:
Oh sure. Unfortunately, we do a lot of capacity plannings with 30 days or less of data, which is obviously risky. We put all kind of disclaimers on and say, "Hey." For example, let's say if you are a retailer and you're busy season... By the way, for example, every retailer, if you ask them, thinks Black Friday is their busiest day. It never is. I'm going to capitalize that with a number. It is their busiest financial day, it's not their busy CPU day. Their busy day is in August, in September when they're packing and shipping, then doing all this stuff to get ready to sell all this stuff.
Tom Huntington:
Got it.
Randy Watson:
Their machines get really busy when you're doing that. But if you only have data in March to get ready for the peak season, your data in March doesn't look anything like the data in August. So then you start having to guess and saying, "Well, let's just be conservative, and let's take March data and increase it by 50% to simulate August." So we can do that, but obviously it's much better to have that real data than it is just to guess it.
Tom Huntington:
Obviously. So we can help people out with services around this. So Randy, let's maybe show them what we're talking about here. I'm going to make you presenter again here-
Randy Watson:
Okay.
Tom Huntington:
... hopefully, and then you can show how we go about finding out what ran too long.
Randy Watson:
Sure. I just switched systems to my system, just to make it easier. So again, there's a lot of options here, but I'm just going to drill down into historically, and will show you here in a second. So for example, right there on the third. You can see there was a little a bump in interactive on the third. So now we're talking over a month ago. So I'm going to drill down, and now I'm going to drill down into the 24 hour view of the world. Lo and behold, there it was. Something ran for about 30 minutes, took the machine to 100% busy. Well, I wonder why. So if we use this...
Randy Watson:
So oftentimes when we're doing a capacity planning, the question is, when we're using the baseline, is this an anomaly or is this reality? In this case, this is probably an anomaly. But I'm going to drill down into it, and I'm going to get a list of every job that ran in that interval, and I'm going to tell you how many run seconds, CPU milliseconds, job type. All I got to do is click on one of those, and we know what job it is. So there is Joe. Joe was doing something. There's our HTTP server. So Joe was running some query probably... Who knows what he was doing, but it was two clicks of a button, and I figured out who it was and why.
Randy Watson:
Now, the question is, is this something that happens regularly? Well, by the way, we tracked Joe. We track every user historically so we could know that, is this something unusual or not? So this gets into... When you're doing problem to termination, I call it a process of elimination. Is it the CPU? Yes/no. Is it the user? Yes/no. Does this user do this all the time? Yes/no. So you walk through a lot of options as to the profile of the resources consumed by every user and every job to help do that problem determination.
Tom Huntington:
Now, can you do that for AIX too? Can you drill into the process on-
Randy Watson:
Sure. Yup. That's a little different.
Tom Huntington:
You don't have to show that but-
Randy Watson:
Yeah, we can. It's a little different, obviously. In the IBM i world, we're talking about jobs and users and that kind of stuff, in AIX is called a process. They don't metrically do it as well as i does. I as the best metric system, I would say it's on par with Z as far as metrics and tracking applications, but AIX does processes and so we certainly do track that so we can tell you what processes are consuming what. So for example, in the AI world, when we see a CPU get really busy, it's often Python. You drill down and you're going to see Python as the busy process.
Tom Huntington:
Cool. That's true. We when you think about AI, what do they want to do? They want to process as many images or whatever data as fast as possible. So having a tool like this to show the impact of, to your point, tweaking the application, so performance better, is a huge deal for them.
Randy Watson:
Absolutely. Yup.
Tom Huntington:
Yup. So Randy, I know time wise, we probably got to move on to our next topic here. So-
Randy Watson:
Sure.
Tom Huntington:
... let me... Let's let's keep rolling along here. Actually, what we have up next is a polling question for you guys. Let me open up our next poll, and we want to know what OS system you're concerned about. What are some of the flavors? You can choose more than one, I hope. You should be able to select all that matter. We'll see what our audience has. All right. Looks like we're getting a little mix here of AIX, IBM i, Linux, even people with Windows out there. What can we do for Windows? That's interesting. But we figured we'd have a pretty a tainted crowd towards IBM. That's not tainted. That's a good thing, isn't it, Randy?
Randy Watson:
Yeah, I was going to say that's okay.
Tom Huntington:
I shouldn't say it that way, but it's an okay thing. We're good with that. So let me close this poll and show you guys the results here. It looks like 92% of you have IBM i, but then another 22% with AIX and almost 20% with Linux and 20% with Windows servers alongside of their IBM i. That's pretty typical. All right. Thank you for answering that polling question. Let's move on to our next topic and talk about competitive situations. Some of you might not be thinking about, well, I'm in competition here. But Randy, what are some things that you've seen or done with customers or business partners that are looking at competitive situations? Why would Performance Navigator help them with these opportunities?
Randy Watson:
So as you mentioned... and this doesn't happen very often because as the marketplaces move, there's not that many Sun and HP servers around, but-
Tom Huntington:
You see a lot more Unix comparisons. Right?
Randy Watson:
Right, right. But do do that because we can collect data on Sun and HP, but again, we don't see a lot of that. What we do see is obviously there's a lot of Linux. I'm going to say most of those Linux partitions are running on x86. They run there for a variety of reasons, a lot of it has to do within how skills... They already have the infrastructure, the vendor that sold them the software, they can't spell Power, all they know is x86. So it had to do with their background, their knowledge and so forth. However, having done this for years and years, obviously, Linux is going to run better on a Power machine, not only on a single LPAR, but you can stack way more Linux partitions on a given frame than you can on an x86 even with PowerVM.
Randy Watson:
So by the way, even if you're running an x86 Linux partition under PowerVM, we actually have a tool to go after the VMware infrastructure to understand how it's configured. So there's that place. Now, a lot of this stuff nowadays is talking about the AC922 and the newly announced IC922, which was announced yesterday. These are all AI and ML servers. So it's the same question with an added complexity. So now you have CPU, so you've got one or two sockets, and then you have GPUs, I got one GPU, two GPUs, three GPUs, and this is all... the question about what I'm using. You mentioned PowerAI vision, which is a product IBM sells, and analyzes video. Well, how many CPUs do I need, versus how many GPUs do I need?
Tom Huntington:
Sure.
Randy Watson:
These GPS are not inexpensive. Getting back to that cost performance equation, we can profile your AIML workload so that you can properly configure these machines because ultimately, you're going to put these in the test environment and then when you roll this into production, you're going to cluster them. So you're oftentimes going to have many of these things, and the question is, well, do I need 10 AC 922s with four GPS or do I need 10 of them with just two GPS because I'll never use more than two? Well, we can help you answer that question. Again, it's all about the data. You've got to have the data to do this.
Tom Huntington:
If done right, that's money back in your pocket, right?
Randy Watson:
Absolutely.
Tom Huntington:
Right. Okay. So the next topic is... I used to tell everybody your CIO has two things on their mind: Security, I'm concerned about security and, what can I put in the cloud. So you gave a good example of a bad example of somebody going to the cloud. Randy, what do I need to plan differently as I think about cloud? [crosstalk 00:50:47].
Randy Watson:
When you talk to any cloud vendor... and obviously IBM is pushing, there's a lot of business partners that have cloud and they're good. By the way, today Google has Power in the cloud, Microsoft has Power in the cloud, and then there's a lot of cloud vendors, MSPs that you obviously are an option for you. But ultimately, the question is the same, however, it is managed differently. Because most of the time, you are going to go from a single tenant box, if it's an on-prim machine, to a multiuse tenant box, which is first question. So when you're playing differently, one of the things you really have to ask the vendor is you need to know exactly what machine they're planning on putting you on and what the disk invigoration is. Because if you don't know, it is... I'm going to call it a crap shoot. Because-
Tom Huntington:
You say I/O matters in the cloud?
Randy Watson:
Well, I/O matters as we saw in that one time big time. Again, if you think about when you go into the cloud and you're in a multi-tenant machine, every partition that's added to that machine makes the disk performance worse. I want everybody to think about that for a second. So if I got 10 LPARS on a machine that's running off an external disk and I put the 11th one on there and that 11th does the query from hell, all the other 10 are going to feel it.
Tom Huntington:
Hmm.
Randy Watson:
So you have to be really careful. In fact, it's almost more important in the cloud because of this multi-tendency thing. But from our point of view, we can gather the data.
Tom Huntington:
So collection services still runs on the cloud, right?
Randy Watson:
Absolutely.
Tom Huntington:
Nmon?
Randy Watson:
And so does Nmon, all that still runs in the cloud, and it should. If it doesn't, for some reason, you should require your cloud vendor to provide you that data.
Tom Huntington:
So what happens if I start to use more data or resources than expected? What should I be thinking about?
Randy Watson:
Well, see, that's the other thing about contractually in the cloud. So we deal with a lot of MSPs, and even if you know you're running out of... One of the good things about the cloud is because you don't have to buy a whole machine, you can only buy what you need. That is true, you can only buy what you need. But if you really need more, then you get into contract discussions. Can you contractually consume more than you've already paid for? How does that work and how is it calculated? Of course, we can help you plan for that, because if we understand your workload, even in the cloud, we can help you plan your budgeting.
Randy Watson:
So let's say every month end, you tend... Let's say you're in a cloud that charges you by the hour. So by the way, that's an example, IBM charges by the hour, or can charge you by the hour. We can help you understand so you can budget that every month end, I'm going to use five hours more than my standard usage in the cloud. So we can help you budget that. So this helps finance people in... But the thing about... You lose control. I think that's a very important thing to note. You can't easily change the configuration without potentially changing the contract or having a discussion with the vendor, depending on which vendor it is and depending on how they're set up.
Tom Huntington:
Yeah. So I think that that's a really... I thought it was important to talk about cloud, because it has been such a big topic. We've been hearing a ton about it from all different angles. You need to think about what you're doing there when you go to the cloud. So let me open up our last polling question. How can help systems help you? Certainly, you can select all that apply. We do a lot with security. So number one more thing on your CIO's mind is security. Let's help you out, let's help you understand. We do the free security scans. We've been helping customers, walking them through, what are your HA options? Maybe you want to improve your backup and recovery, you want to add automation to close the skill gap. We're finding more and more customers are struggling with... they don't have enough resources while automation certainly is a thing that can help you out. And then increasing programmer productivity; you want to do more there. If you don't know, we work on RDI and can help you out there. So Randy, we did get a few questions. While I leave that polling open, I want to get over here and see what kind of questions we have.
Randy Watson:
Yeah, I'm trying to answer some of these as we go along. I don't know if I answered all of them.
Tom Huntington:
Oh yeah. Okay. Okay, you're multitasking.
Randy Watson:
I am.
Tom Huntington:
If you only have IBM i servers, does the GUI suppress information columns that are specific for AIX and Linux? So if you don't want to see AIX and Linux, what do you think of that one?
Randy Watson:
Obviously, when you go into the GUI itself, let's say you're in a big organization and all you care about is i, each individual user can connect to only the things that they're interested in. So if you're only interested in i, you only connected the I. If you're connected with the whole enterprise, then you would connect to everything. The AIX people may only connect to the AIX partitions.
Tom Huntington:
So how does that flight recorder process work? So it's automatically collecting the data with Performance Navigator, but how are you saying or keeping it consolidated so it doesn't take up all that disk space?
Randy Watson:
So on i, we have a data reduction process that runs every night. So basically, all of the host code on i wakes up 30 minutes after midnight, analyzes yesterday data, and it does reduce the data. So we use probably less than one, maybe 2% of your disk after years of historical data. So this is how we manage the space. For inmine, I'm going to say we keep a year's worth of a lot of data by default. Disk is the exception. By default, we only keep 90 days of disk data. Now, that's a variable. The reason is, 60% of the space inmine consumes has to do with disk. So we only keep, by default, 90 days of that disk to conserve space. And then of course we GZ it and that kind of thing. So we usually use less than three gig of space even after a year or so on the AIX.
Tom Huntington:
Awesome. So while we've been answering questions, just to remind people, help systems cybersecurity, automation, business intelligence, document management, we can help you with all those things on IBM i, AIX or even Linux, if need be. We help a lot of customers across cybersecurity and automation, for sure, in Unix, Linux and Windows along with IBM i. So keep that in mind as you look at solutions out there. A lot of crave in the industry about RPA, robotic process automation, we do that for customers. Our latest acquisition, we had a ClearSwift to us in the security space, and now we can do DLP. All right. So moving ahead here. Another great question we have is, our application has historically been very sensitive to disk response. To meet our SLAs, we need to be within sub-micros. Are there any things to look out for or warnings about external storage?
Randy Watson:
I will tell you, the biggest issue I see in configuring external disk is that obviously, from a technology standpoint, if you go from internal to external, you should be at least on par. So in other words, if you have a solid state disk internally, you better have a solid state disk and an external disk. Now, the other issue is internally, it's a one-to-one logical to physical. When you go to external, that's usually not the case, it's one to many. So a lot of people, the biggest mistake they make... Because it's the storage people that usually configure this for the I people, and the storage-
Tom Huntington:
Oh, sure.
Randy Watson:
... people think like, Windows, and they think, "Oh, I need four terabytes, so I'm going to give you one terabyte allowance." Trust me, I've seen it. That's not going to work on an I. So the number of logicals should be at least equal to or more on an external desk then you currently have in order to maintain some closeness or relativeness to the performance. So that's very important, the size of the line and how many...
Tom Huntington:
And then question with storage again is, I have a 924 and I'm using external storage and I'm also doing storage replication with that setup. Does that impact the production LPAR performance?
Randy Watson:
Absolutely. Sure. It is sort of transparent. In other words, from the I or AI expertion, we know what the disk response time is, and it's x. Let's say it's a half a millisecond or 500 microseconds. If you're doing a lot of DR and a lot of writing to that and therefore you're doing a lot of... there's overhead in that controller that's going to slow that guy down. In fact, I've seen where they've turned off replication, and response time has improved. So it's hard to... Now, you may not be able to do that because that's why you bought the external disk, but it does impact it. But the thing about it is, from a logical standpoint, we track that response time, and we can certainly tell you what your current performance situation is. If it's not good enough, then we have to work with the storage people in IBM or business partner to help. What's the possible remedies? Is it more-
Tom Huntington:
Sure.
Randy Watson:
... arms, is it more cash and that kind of thing.
Tom Huntington:
Awesome. Well, hey Randy, you've done a great job on the Q&A. There's a couple of other questions out there. I'm not quite sure how to ask the question of you, so we'll just kind of hold it off and we'll see if we can take that offline with the customer. Awesome job today. Really great hearing, all about Performance Navigator and all the capabilities. For our business partners and customers out there, we thank you for joining us today on yet another webinar. We have more coming, look for IBM i 7.4 and what's new in 74 in December and just a variety of other topics coming up. We thank you again. Randy, make it a wonderful day out in Colorado. Hopefully you got sun out there. In Minnesota, it's not so good. Take care.
Randy Watson:
We have the sun. All right. Thanks everybody.
Tom Huntington:
Yup. Bye-bye.
Randy Watson:
Thanks, Tom.
Can you size your next Power server with confidence?
Learn how to size your POWER9 Power Systems servers with confidence. Download the IBM i capacity planning guide today!