Data Center Energy Management
My first UX project: a data center energy management platform I helped design from scratch at HCL, with no design system and, for most of it, no access to the people who would actually use it.
- Role
- UI/UX Designer
- Year
- 2023-2024
- Client
- HCLSoftware (in-house)
- Tools
- Figma
Overview
DCEM stands for Data Center Energy Management. HCL's telecom side ran a few of its own data centers and wanted one place to see how much energy they were using, where the problems were, and where they could save. So the product is built for data center operators, the people who look after the racks, servers and VMs every day. It started as an in-house tool for HCL's own data centers.
Starting with almost nothing
This was my first real UX project, and we started with almost nothing. No design system, no component library, just a color palette and an icon set we were told to stay inside. So the UX, the UI and the components all got built at the same time. I would sketch a screen, work out what it needed, and build that component in the same pass.
Two things made it harder. The domain was new to me, so I had to learn what racks, VMs, power profiles and energy KPIs even were before I could design for them. And for most of the project I never spoke to a real operator. Requirements came from management, and I leaned on a persona we wrote ourselves. We only met an actual user late, after the demos had started.
Designing for a user I couldn't reach
Since I couldn't talk to a real operator, I built one to design against. Jerry runs data center operations and has spent fifteen years in telecom. His job is keeping the data centers performing, planning capacity, catching problems early and protecting the equipment. Whenever I wasn't sure a screen earned its place, I checked it against him.
Discovery: inspiration and sketches
Before designing anything, I looked at how other energy and monitoring dashboards handle this kind of data. Which KPIs they show, what units they use, and which chart they reach for and when. That turned into an inspiration board. From there I sketched the dashboard and the module layouts roughly before moving to high fidelity.
Building the system in parallel
Because nothing existed yet, every screen meant building new components. Not just inputs and dropdowns, but the harder ones: KPI cards, gauges, heat maps, mixed bar and line charts, maps, and a lot of dense tables. Some were involved, like the PDU and SNMP setup form, which had to map KPIs to OIDs and validate things like IP addresses inline. I documented each one as I built it so the rest of the team could reuse it instead of starting over.
The dashboard I owned
The dashboard was mine to lead, and it is the page I iterated on the most. The world map shows every data center at once. I zoomed out to world level on purpose so all the sites sit together, even the ones that are close. The Energy Trend chart is the one I cared about most: it lays projected power over actual power, so you can see consumption drop once servers move onto a Dynamic Power profile. Energy Distribution splits usage across the profiles, dynamic power saving, balanced and high performance. The savings chart puts daily savings as bars and cumulative savings as a line on the same view. Data center health, with its alarms and power utilization, sits next to it.
Reports
Reports is really a set of reports behind one page: Utilization Overview, Utilization Trend, DCEM KPIs, ESG, Infra Utilization and Anomaly Overview. Each card opens its own detailed view.
Inside a report
The reports go deep. Make and model power insights, the data centers running at low utilization, a scatter of CPU usage against power, rack temperature mapping, and ESG numbers like power density and the VM to host ratio. You can filter them and download them. These pages are tall, so I kept them in a carousel here instead of stacking them into one long scroll.
Utilization Overview: power by make and model, the low-utilization data centers, and rack temperature mapping.
Using color to mean something
We couldn't invent new colors, so the few we had needed to carry meaning. The same rack grid shows up for three different metrics, and each one gets a color scale chosen so the thing that matters stands out.
- Step 01 / 03
Temperature runs cool green to amber to red, so a hot rack is impossible to miss.
- Step 02 / 03
Power consumption uses one light to dark ramp. One number, one color, getting deeper as it climbs.
- Step 03 / 03
Weight uses a neutral scale. Full racks read green and lighter ones red, so an operator can see where there is room to add.
- Step 01 / 03
Temperature runs cool green to amber to red, so a hot rack is impossible to miss.
- Step 02 / 03
Power consumption uses one light to dark ramp. One number, one color, getting deeper as it climbs.
- Step 03 / 03
Weight uses a neutral scale. Full racks read green and lighter ones red, so an operator can see where there is room to add.
Site Explorer
Site Explorer is a set of cards, one per data center. It covers our own sites and external ones too, like Amazon and Google, because the idea was that outside data centers could run on this as well. Each card shows where the site is, down to latitude and longitude, with its energy, memory and CPU use. Open one and you get the detail: category-wise energy, how much each server uses against its CPU, and consumption broken down by region.
- Step 01 / 02
Cards for internal and external data centers, each with location and live utilization.
- Step 02 / 02
Opening a site: category-wise energy, server energy against CPU, and consumption by region.
- Step 01 / 02
Cards for internal and external data centers, each with location and live utilization.
- Step 02 / 02
Opening a site: category-wise energy, server energy against CPU, and consumption by region.
Site Provisioning
Provisioning is where the infrastructure itself lives, from the site down through buildings, floors, rows, racks and managed nodes. There were 245 sites in all. You can add a new site or bulk upload them. Open one, say Bangalore, and every building, floor and node underneath it is laid out in tables.
Policy management
This is where an operator actually changes how power behaves. The page groups servers by their profile, dynamic power saving, high performance, remote control, and shows what each group is consuming. Next to it, the system suggests a better profile for each server with the saving you would expect, and an Apply button. Bringing a new server under a policy takes two steps: pick the servers, then define the policy, scheduled or on demand.
Policy management: servers grouped by power profile, with a suggestion for each one.
Alarms
The alarms page keeps the count of active alarms by severity, shows which sites are throwing the most, and lists the recent ones against their site and ticket.
Mobile companion
A companion app for the floor
The web app lives on a desk, but a lot of an operator's job happens on their feet, walking the data center to find the one server throwing an alert. So there is a companion mobile app built around that reality.
Its user is Mike, an operation specialist who is on call and rarely sitting at a screen. He needs to find the faulty server fast, see what is wrong and how to fix it while standing in front of it, and close the alert on the spot. So the app is built around two things: AR navigation to reach the server, and resolving alerts in the field.
Getting in
Because the app uses the camera for AR and scanning, and location to place you on the floor, signing in is followed by a clear permissions step rather than asking for everything silently.
The dashboard in your pocket
A quick read on the way in: how many alerts are open, server utilization, device counts, and how alerts have been trending.
Finding the server with AR
This is the part that actually needed to be mobile. You pick the floor, the app draws a route over the live camera view, and arrows on the floor walk you to the exact rack. When you arrive it tells you, and you scan the server to pull up its alert.
From alert to fix
Alerts can be filtered and sorted by device, impact and status. Open one and you get the server's state, the downtime and cost impact, and a list of mitigation options. You work through the steps, upload proof of each fix, and submit to close it.
Topologies, profile and closed work
The rest rounds out the picture: browse the topology by site, floor and type, drill into a server's history, and see your own closed alerts from the profile.
Mobile wireframes
The screens were built from these wireframes, where the flows and layouts were worked out first.
Outcome and impact
DCEM began as an internal tool and turned into something we could sell. We ran a lot of demos, and two external clients came on board, with roughly 2,500 racks between them to monitor. That was the proof it worked outside HCL's own data centers.
Together the web app and the mobile companion covered both halves of the job: watching and planning at the desk, and finding and fixing on the floor.
What I'd do differently
Because it was my first project, the clearest lessons are the things I would change. I would step back and map the whole user journey first, then design the components and screens, instead of doing all three at once. I would push harder to get in front of a real operator early, since that feedback only reached us near the end. I would design and document proper patterns from the start, because handoff turned into a lot of back and forth explaining decisions that were never written down. And I would build in accessibility from day one rather than leaving it out, which is something we just didn't know to do at the time.