Sept 12th, 2012, the new Simulink and Stateflow editors are available to the public: www.mathworks.com/downloads

Different people account the timeline differently, but to an approximation the rewrite took about 7 years and occupied a number of developers ranging from a low of 4 to about 18 at maximum intensity. The resulting software unifies and replaces the entire front-ends of two separate diagram editing platforms: Simulink (22 years old, millions of lines and testpoints, 100K+ customers), and Stateflow (16 years old, many hundreds of thousands of lines, 50K+ customers). So for almost one third of Simulink’s and half of Stateflow’s history, their front ends have been under rewrite. It was a massive project internally known as the “Unified Editors.”

We expect them to be a major success with users. They certainly represent a total overhaul both architecturally and interactively. They’re a nice piece of technology. And I don’t say that because I helped lead it—it’s rare for me to be able to see anything but flaws in projects I’m involved in. They have been in the hands of pre-release customers for some time, and are receiving very favorable feedback.

It is unusual for rewrites of this magnitude to succeed. MathWorks management deserves a great deal of credit for allowing the project to converge instead of losing faith and cutting it off midway. That said, I will never in my career undertake another project in the same way I did this one. The Unified Editors have shaped me as much as I’ve shaped them.

Here is a laundry list of what a 7-year rewrite has taught me. I expect to write more on a bunch of these items in follow-up pieces. But for now, here is a pure dump of what I know now that I didn’t know when I started (none of which younger me would have taken on faith). And neither should you. It’s worth noting that other people involved in the project have different views and took different lessons. I was a chief initiator, the technical lead, an individual contributor, and one of three development managers for the project. And, of course, 7 years ago we promised the work in 2 years. So go ahead with your 2-year rewrite and we’ll compare notes in 2019.

Estimation

You can’t estimate anything longer than 6 months
You can’t estimate anything that isn’t broken into 1 week tasks that are specific
Tasks that say “Implement X” are not understood
Putting large-scale items on a long timeline is fun but useless
Team members have a better assessment of readiness than direct management
Outside observers may have a better assessment of readiness than team members

Big bang vs. staged delivery

Big bang looks good because of early underestimation
Things will end up taking as long as incremental staged delivery anyway
Big bang happens because of doubts about sustained institutional investment in long programs
Staged delivery can be cut off at any point when more important pressures arise, leaving a program half-complete
It’s easy to know you’re converging when you are converging
It’s impossible to tell if you will converge if you are not yet converging

Rewrites & backwards compatibility

A large system has more behavior than anyone thinks it does (maybe 10 – 100 X)
Everything is the way it is for a reason
All the absurdities are that way because they needed to be that way at some point
People write what they can get away with
All quirks are baked in as assumptions to other existing systems
Avoid replacing successful legacy systems
Develop something else instead
Think creatively about how not to do a rewrite
A new product is 10-100X easier than a replacement to a large legacy system
Write something new that can gradually come to eclipse the feature set of the old
Backwards compatibility is a drag on developers, products, and quality (but may be necessary for customer/business reasons) (Look at what Apple gets away with. Nobody loves them for their lack of commitment to backwards compatibility. People love Apple for the products and technology that ditching old standards permits them to produce.)
In a system designed as a new framework and port of a legacy system to that framework, production of the framework is 10-100X easier than the port
It’s hard not to consider them 50/50 in planning, but they’re not

Architecture

Sitting on top of legacy systems instead of cleaning them up has several characteristics

PROS

You don’t disturb anybody working in that codebase
You don’t regress existing functionality
You decouple shipping schedules
You rely on no other teams for deliverables and they don’t rely on you

CONS

You are subject to all the vagaries of the existing codebase
Rather than smooth out rough patches, you make new code rough to conform to them
At the end, you still have all the cleanup to do
You do not have to communicate with other teams, so you have to force yourself to (we didn’t)

Team

The team has a more realistic assessment of readiness than management
Unrealistic targets are really demoralizing and demotivating
Missing targets, realistic or not, is demoralizing
Protracted stabilization is soul-crushing
Customer exposure is a big morale boost

Scope

Scope should be aggressively minimized
Features that management believes in more than developers do are demoralizing
Cutting features is great, the more the better
Minimum viable product considerations are very hard to evaluate when replacing an existing system

Performance

Modernizing an old codebase will require more memory
Dedicated performance engineers really help
Performance, especially of interactions, is very hard to lock down

Testing

Test coverage of the existing system will not be good enough
Passing the old tests is essential, but doesn’t indicate anything about the quality of the new work
The failures of the new system will be very different and the existing tests cover mostly the old failures

Refactoring

It really makes a difference
People don’t want to work in a dirty environment
The team knows what isn’t working and needs support to be allowed to fix it properly
Done properly it does make remaining work go faster (can make the difference between converging and diverging)

Full-stack Iterations

Must eventually stop rejecting and throwing out iterations and settle on one to prepare for shipment
Key to utility of iterations is quantity and speed
Anything that impedes speed or increases cost of production or throwing away is getting in the way
Never ship features based on an iteration that may not be the final one

They are a distraction
They are never excellent features because they aren’t what the team really means to do
The work required to bring them to shipping state and maintaining them during the main effort is a huge distraction
There is a little bit learned about existing systems and what bringing them to production quality entails
They remain a huge drag on attention and resources even after the primary shipment because they need to be ported
Requires organizational support not to demand them from a long program

Clients

Having clients too early is deadly
Mismatched schedules and requirements will warp growing systems
The integrity of the framework is compromised as shortcuts are taken to satisfy immediate needs of clients out of the appropriate construction sequence
You will never feel ready for clients even when you are
At the point the work is ready, turning away clients is destructive
It takes three clients to sufficiently drive generalization of a framework

Stabilization

You have to turn it on before it’s ready in order to get it ready
The issues involved in really running a new system in production cannot be simulated
You should not plan to turn on for the first time and ship in the same release

Requirements

Shifting requirements are a reality
But in-flight design changes must be minimized
Choices made off-the-cuff need to be considered for their expense over leaving things the way they are
Complex systems need a design document for developers, testers, doc, and usability to work off jointly
These must be done at a fairly low level, a high-level one doesn’t specify anything sufficiently

Prototypes, walkabouts, demo nights

Never ship features based on early iterations (did I already say that?)
Prototypes always appear closer to ship-readiness than they are in reality
Prototype code must be kept out of the production stream
That requires development procedures that enable it
Prototype code that leaks into production will cause problems for a very long time
Customer exposure is a big morale boost and mitigates risk
Walkabouts have to be well-managed and infrequent so as not to leave people waiting and uncertain
Some people’s work shows more easily than others
Some people like this kind of exposure more than others
It’s good for upper management to meet the team and talk to them
There is a danger of pressure to change design on-the-fly at these events
Pressures to show work for a deadline leads to shortcuts
That’s OK in prototype code, not in production
The deadline for something to be shown in demo/walkabout must be a decoupled from the deadline for submission to a production stream (they may have very little to do with one another)
But it’s nearly impossible for a viewer of the work to understand that it is nowhere near complete

That’s all I can think of at the moment. There’s a lot I want to write more about. I can’t wait to apply what I’ve learned to making more software, better, faster.