A Novel about IT, DevOps, and Helping Your Business Win
Bill is an IT manager at Parts Unlimited. It’s Tuesday morning and on his drive into the office, Bill gets a call from the CEO.The company’s new IT initiative, code named Phoenix Project, is critical to the future of Parts Unlimited, but the project is massively over budget and very late. The CEO wants Bill to report directly to him and fix the mess in ninety days or else Bill’s entire department will be outsourced.
The Phoenix Project is a modern re-interpretation of a book called The Goal. Originally published in 1984, The Goal was focused on helping american manfuactuers keep up with enlightend foriegn manufactures. Overtime folks began to see The Goal through lens of IT. The authors the Phoenix Project decided to retell it in that context.
Capacity and Constraints
- The whole process can only move as fast as the slowest part, or the constraint/bottleneck
- Optimizing anything other then the bottleneck is an illusion
- Find the bottlneck
- Optimize flow for the bottleneck
- Optimize new bottlenecks
- Start releasing into the system at a level that can keep utilization of the constraint the highest.
The Three Ways
The First Way emphasizes the performance of the entire system, as opposed to the performance of a specific silo of work or department — this can be as large a division (e.g., Development or IT Operations) or as small as an individual contributor (e.g., a developer, system administrator).
The focus is on all business value streams that are enabled by IT. In other words, it begins when requirements are identified (e.g., by the business or IT), are built in Development, and then transitioned into IT Operations, where the value is then delivered to the customer as a form of a service.
The outcomes of putting the First Way into practice include never passing a known defect to downstream work centers, never allowing local optimization to create global degradation, always seeking to increase flow, and always seeking to achieve profound understanding of the system (as per Deming).
The Second Way is about creating the right to left feedback loops. The goal of almost any process improvement initiative is to shorten and amplify feedback loops so necessary corrections can be continually made.
The outcomes of the Second Way include understanding and responding to all customers, internal and external, shortening and amplifying all feedback loops, and embedding knowledge where we need it.
The Third Way is about creating a culture that fosters two things: continual experimentation, taking risks and learning from failure; and understanding that repetition and practice is the prerequisite to mastery.
We need both of these equally. Experimentation and taking risks are what ensures that we keep pushing to improve, even if it means going deeper into the danger zone than we’ve ever gone. And we need mastery of the skills that can help us retreat out of the danger zone when we’ve gone too far.
The outcomes of the Third Way include allocating time for the improvement of daily work, creating rituals that reward the team for taking risks, and introducing faults into the system to increase resilience.
4 kinds of work
- Business projects
- Internal Projects
- Preventative Maintaence
- Unplanned Work
What is it that IT does? What if I said that IT Operations does only 4 things, well more accurately, 4 types of work? And by understanding the types, their relative importance and how work flows through the organisation you will be better equipped to improve the delivery of projects, manage outages and compliance, and limit work-in-progress.
So what are they? No, one of them is not email.
Well it should come as no great surprise that the first one is Business Projects. You know, those projects that the business is screaming for, those projects that have a direct link to business processes and tie to the outcomes of the business groups. In essence this is all the business is really asking IT for.
The second is Internal Projects. Installing a fleet of new network devices, decommissioning a data centre, and any number of other internally focused system based activities. The problem with many of these projects is that all too often internal teams are left to manage them independently. They progress with little oversight or visibility and consume untold amounts of resource, which will often adversely affect progress on Business Projects. We have the PMO managing Business Projects, but who manages internal Projects?
So projects are generally classed as larger pieces of work, but we all know IT does a lot of little jobs to keep systems functioning. Patching, security upgrades, vendor software updates, problem resolutions etc. All of these, and many more, make up a third category of work, Operational Change. Every day IT operations will be registering, planning, assessing, building, testing and deploying changes, which may also include managing the process to deploy a change that relates to ether of the project types above.
And lastly the killer, the work that keeps Operations Managers, Service Owners and other IT folk awake at night. This type of work has the ability to put everything else on the backburner, which can then affect the delivery of projects, the deployment of changes, and any other work that IT may be attempting to deliver. Unplanned Work. Unplanned work is recovery work, which almost always takes you away from meeting your goals.
Unplanned work is the enemy here.
“Preventing unplanned work is incredibly important because it prevents the under-utilization of the bottlneck” see “Total Product Maintenance”
“Improving daily work is more important then doing daily work”
“It almost doesn’t matter what you improve as long as your are improving something. Why? Because if you are not improving, entry guarantees that you are actually getting worse, which ensures the there is not path to zero-errors, zero work related accidents, zero loss” - Mark Rather
- The machine
- The person/team
- The Method
- The Measuers
Work center is the combination of worker/and work center.
Function Documentation: Provisioning a server is a set of steps, its not magic. As we make changes why not document the exact set of steps so that the process is repeatable? Furthermore, why not use something like Ansible that acts like functional documentation of the steps. You can see whats going to happen. Instead of typing commands into a machine, just change the playbook, and run it again. It forces you to document the entire process. It should standardize the flow, and prevent unplanned work because their are no surprises.
Improve as you go: Taking the time to improve the system, and capturing improvements should be important.
Derriving a kpi, and example.
If you are a cross-country deliver company, a primary KPI is probably percent of on-time deliveries. An immpediment to on-time deliveries is probably vehicle breakdowns. One way to prevent vehicle breakdowns is to change the oil often enough, like very 5k miles. So, a forward looking 5k is what percentage of vehicles have had their required oil changes performed.
Understanding the needs of the business
Interview the different business units to figure out what kind of an infrastructure they need to to increase growth.
Q: Without data how do you know what you need to grow
N: When you interview all the business owners, won’t they see you coming? If you need to asses the state of the companies IT needs, and business owners see you coming, won’t they attempt to jocky position with you? In the book everyone seems bewildered that you are even asking about them, in real life their is going to be some politics their.
Work should never go backwards, probably a reason for not cutting release branches.
Continuous Deployment is the inevitable outcome of apply the three ways.
“Until code is in production, no value is actually being generated because it’s merely WIP stuck in the system.”
“You pay back the company faster for its use of capital.”
Over all thoughts
Quality of life
I missed this entirely the first time through. They constantly re-iterate about how hard this industry can be at times, and how we are literally grinding our people up. It’s easy to think that quality of life is about how it looks. Even in places where you have “good hours”, ie no one is held to the grinding stone, etc, if you can’t spend time on improvement that can be a grind.
We need to find ways not only to be nicer, but to support innovation as well. It need’s to come from both directions.
Also, clear goals. If you company doesn’t have clear goals, your employees are going to get burnt out over not being able to deliver.
At first I thought this was just about integration, and delivery, but I have started to realize that continuous is a theory for everything. Always be learning, always be shipping. I think this is where the OODA loop comes in. If your ooda loop is months, or years you can’t deliver on value.
Technical Debt: Is like trying to shove to much into a suitcase. Sometimes it works, but when it fails all your clothes fall all over the airport. At the worst possible time.
- Understanding what you as a team are committed too
- Figure out resource utilization
- Optimize for the bottleneck
- Value stream mapping?
- Reduce batch size
- To increase quality
- Lock up less capital
- Make deploys easier
- Understand the vision of the company
- Mapping IT too business value, based on vision
- Prioritize work based on vision
- Know how to take calculated bets
- Business needs to understand its self.
- If we don’t know how the business grows, we can’t get close
Things that I didn’t feel like the book explicitly stated, but was important for the story to work.
- Intrinsic motivation
- Everyone was intrinsically motivated. They love the company they work for. They understood exactly what the goals of the company were, and exactly how to improve the company they just couldn’t get the IT support they needed.
- Everyone got the vision, but they didn’t have the tools
- The impact that working at a disjointed place has on our lives.
- The business new exactly the KPIs
- What do you do if you don’t have a vision, and you don’t have KPIs
Questions to learn more
- How do you identify work centers in IT operations?
- How do you identify the measures?
- How do you standardize the work?
- How we can we attach preventative KPI’s to primary business KPI’s?
- Also checkout W. Edwards Deming, John Allspaw, Paul Hammond, Jez Humble, and Dave Farley
Chapter 1 - He gets the job
- Talks briefly about the state of failure the company is in
- New CEO
- IT Guy gets a promotion
- Has to fix payroll
This is the classic way you see IT. Why can’t it just be like toilets? It’s a misunderstanding of it as infrastructure. It is infrastructure, but it’s also your business process represented as architecture. Your toilets don’t represent your business.
Also, starts to being in the themes of being overworked, and asked to do two things at once. Family and work, and not being able to do either very well. Re: the scene in the doctors office.
Chapter 2 - payroll issues
- Sets up the insanity of working or large long running projects.
- Payroll is hugely important, but it’s broken, but not completely broken.
- With elbow grease it continues to woke
- Until one day it doesn’t.
- Even the fix for the fix doesn’t work.
- Cause: no one felt that they had the time to do the right thing.
- They had 3 other things they needed to do yesterday, so they fixed it as fast as they could.
- The introduction of Brent. He always seems to be at the center of everything.
- Are all the issues related, or not? We have no idea.
- Starts to bring up the work queue. Who gets priority, what external business leaders can meddle with IT.
- What’s with the references to the paint and the carpet? Is he just trying to say that things should look nice so that people feel nice.
- Tech can’t fix systemic issues.
“We’ve had so many problems with this particular upload,” she says, obviously frustrated, “that IT gave us a program that we use to do manual corrections, so we don’t have to bother them anymore.”
Lets fix the fix for the fix. Instead of addressing the issue a hand everyone dances around the issue.
For some reason I’ve encountered a large amount of people who seem to either not want to track down the issue, or don’t know that they can. What ends up happening now usually is you end up building an upside down pyramid.
“To get Finance the data they need, we may have to cobble together some custom reports, which means bringing in the application developers or database people. But that’s like throwing gasoline on the fire. Developers are even worse than networking people. Show me a developer who isn’t crashing production systems, and I’ll show you one who can’t fog a mirror. Or more likely, is on vacation.”
We can’t do this the right way because it seems really hard right now. And, I agree. I have seen many projects way after you could have fixed them. Fixing isn’t something you do at the end. It’s baked into the process.
“Building a new server is now a right-click inside of an application. Cabling? It’s now a configuration setting. But despite the promise that virtualization was going to solve all our problems, here we are—still late in delivering a virtual machine to Chris.”
The general depth of brokenness is laid on heavily here. It just keeps getting worse. You can’t get to meetings. You can’t get onto your laptop. You can’t get your calendar to the people who need it. All because everyone is too busy.
- Conflict as an unnatural state
- Preposterous conditions for release
- Because they are so far behind
- Because they aren’t released in pieces
- Because they are so far behind
- Things are so deeply broken because everyone is busy
- How beauracy can get in the way
- Sets up completely unrealistic expectations for the project
- No one is listening to IT
- Could this be the chicken little problem
- Dev has used all the time
- Biz still says go
- Later in the changes meeting no one shows up
- Another layer of no situational awareness
- Audit stuff
- Not sure I know what this means for a company that isn’t bound by SOX
- Starting to understand commitments
- You need to be able to see what you are working on
- What are we signed up to do
- Make it easy to know whats going on
- People won’t do hard things or pointless things
- You need to give people time and space to think
- They start to realize how over committed they are
- By visualizing the work
- Writing docs (Need to be automated, part of the build)
- We can mostly automate the process of writing docs
- Manual testing (Needs to be automated, part of the build)
- We can automatically test like 99% of all changes if we build it into the process
- The design process can be a waste if you try and think to far forwards
- “Write it down”, and hopefully we’ll get to it later
- Did you tell me to just go fuck my self?
- The first meeting with aid
- First thing that people try is to grasp tighter
- That may not be what is needed
- What are the kinds of work?
- Introducing the idea that you can map it to a manufacturing process
- Theory of constraints
- How do you control the flow of work?
- WIP is a silent killer
- Optimize for the bottleneck
- Introduces the 3 ways
- Possible Bottlenecks:
- How do we know what to build?
- If our customer feedback cycle is months, or years, how can we effectively compete?
- Human time spent fixing bugs, for sure.
- Not spending time fixing the lower level issues
- Spent discussing the wrong bottleneck
- Re-iterating make work visible
- But, if you don’t have support from management you can’t do much
- Management thinks its spending to much on IT so it won’t green light new initiatives
- Prioritizing changes
- Understand risk
- Working with other departments to help make changes
- Nothing is special
- Have one system for rolling out
- Change management starts to work
- Think they have identified the third kind of work
- Introduces the idea: The main mode of operations is to understand, to seek information.
- Going and talking to brent made the problem simple
- But, only someone from outside the system made that possible
- ^ GO SEEK
- “Death by a thousand cuts”
- Brent won’t document things
- Calling out break fix work as in contention with strategic goals
- Sort of out of scope, but documentation is hard
- Better of building systems that create documentation
- Distribute the knowledge
- We are soloed but we don’t have to be
- Starting to see the 4th kind of work
- IT Needs to accept work, it can’t just let work be assigned to IT
- Seeing how WIP Affects planned work
- When shit is broke don’t run away from it
- No proper release planning
- Fuck this, “Perfection is the enemy of good”
- Lots in here that would be fixed by having a CI/CD pipeline
- Phoenix seriously screwed up
- CEO realizes that they pushed the release too early
- Tech team rallies to fix performance issues before addressing usability issues.
- pCI compliance while you’re treading water is hard
- Looming issues with john
- When he gets a chance to do his job it feels good.
- Have to come close to disasters to make changes sometimes
- Re-iterates how should crushing this work can be at times
- IT is costing more then its making
- Starts to focus on how should crushing it can be to work with technology. You are expected to make miracles.
- Fear of keeping up
- Product can feel like a whiplash if you don’t have a vision for everything
- Quality spiral: More and more work, longer and longer ours timelines lessened, less quality
- Again, with the personal cost that goes with this work
- Starting to see small wins by making the work visible, “One of the problems with prevention is that you rarely know if you averted a disaster”
- Discovers the 4th kind of work, unplanned work
- It steals from your goals
- Introducing the first way: Creating fast flow
- Introduces the second way: Remove sources of unplanned work continually.
- Re-inforcing the theory of constraints
- Subordinating the constraint
- Use tools to release work at the right pace
- Reduce WIP by reduce unplanned work
- Leading to the third way, reduce needless work
- One way to reduce needless work is to align with business
- Do the best you can but don’t let people put unreasonable demands on you.
- Re-iterating the cost to ones self
- The cost was their even before he got in the host seat
- Good old fashioned team work
- It’s a process not an event
- Start to build trust again
- When leadership buys in everyone feels better
- Trust/touchy feely
- Dig into Definition of Done
- Without capacity planning, and planned acceptance you end up with lots of technical debt
- Unplanned work makes it really hard to plan for the future
- So, if you can’t plan you can’t capacity plan and you stay blind
- Fix technical debt so you can increase throughput, which will make velocity go up.
- Single tasking helps, but their is still a lot of work
- Took a lot of institutional rigor to make that happen
- Need to figure out how to schedule work so that things can continue to get done
- Starting to identify the need to understand work centers and how work flows through them
- Figuring out the need to know all the work centers involved and who can man the work centers
- Realize that preventative maintenance can improve work flow and monitoring can make sure they are doing the correct maintenance
- It almost doesn’t matter what you improve, because if you don’t improve anything, but default you are probably moving backwards.
- You’ve got to bake security and compliance into the system otherwise you destroy throughput attempting to force it in at the end.
- There are all sorts of secondary, and tertiary concerns while making software, like security and compliance the point is that you have to involve your self early in the planning process. You need to work in the process
- IT is like a factory work. It has work centers.
- By visualizing flow you can make things more efficent
- I am starting to see how user stories can help bundle work together
- They implemented Kanban
- Then they try and figure out how to prioritize internal projects
- They realize they should prioritize project w which increase throughput at the blockage
- Resource utilizing impacts wait time
- Touch time vs queue time
- Less stress, happy people probably better work
- Start to figure out they why. How to align with business
- When you actually figure out whats important for your company, tech may not be that important
- Goes back too the 3 ways
- Once you understand value for the company you can start to develop KPIs that look forward
- Understanding value
- mapping tech projects to business goals can help IT prioritize but it also means that business needs to know what drives business
- Monitoring is important but connect it with outcomes
- Phoenix was suppose to fix everything but when you try to fix everything you start over from scratch. Maybe you don’t end up fixing the issues.
- R and D can be like WIP that just locks up capital
- Mapping IT to business goals
- Making things visible again makes certain answers easy
- Redefining the work that IT does so it aligns better
- Redefining the schedule in which IT operates works so they don’t ever lock up to much capial
- audit stuff
- lots of 2nd way stuff getting rid of busy work
- Having clear guidlines around outsourcing
- No shadow IT
They continue to run into problems, probably because its so hard to deploy. If they could constantly deploy no one would feel like they need to manually change production.
- Build CI/CD
- Great anecdotes about reducing a 3 day task to 10 minutes
- Introducing devops
- Introducing allspaw
- day die flip to 10 minutes, we can fucking do that its all software
- Value stream mapping
- Staring CI/CD