gargoyle image Collected works and current thoughts

Threading Story

A place where I used threading to speed up a problem, and made it harder for myself.

Overview

I was on an engagement around 1999 where there was a program someone had written to transfer a db/2 mainframe-based data set into a different SQL db.

It needed to run daily.

It took 26 hours to run once.

The program did 3 things:

The program was single-threaded, and the first step took between 30 and 90 seconds. So the program was waiting for I/O around 97.5% of the time.

Threading to the rescue

I took the existing code and broke it up horizontally:

Note, that was how the original code was written. I followed that original design and put threads on top of it. I set up a set of queues between these layers and implemented the producer/consumer algorithm between each.

The program eventually worked. As there were three stages, an input queue, queues between stages, I ended up creating thread pools:

We ran parallel transactions because the keys were guaranteed unique given the source data.

Results

This was early days in Java. The JVM didn’t even have native threading. Even so, the time went from around 26 hours to 16 minutes (that’s where the 97.5% came from above).

It took me just over a week to do that. 3.5 days of that week was getting the program to properly shut down when it was done. The problem was trying to figure out when each queue was done versus empty. How does the consumer of a queue know that the producer is no longer going to publish. That is, it is done?

I’m not going to answer it other than to say semaphores, and look at the algorithm mentioned above. It took 3.5 days to figure all of that out to avoid deadlock on shutdown. The program was running and translating for those days, and we used ctrl-c to kill it until I figured out how to get it to terminate cleanly. (Running in a test mode only, we didn’t switch to it until it shut down properly.)

In retrospect, the three steps (extract, translate, load - ETL), is part of a single unit of work. No one part of it is complete. So breaking it up by technical step might have seemed logical, but had I instead simply had one queue, and multiple threads to handle each instance of the unit of work, it would have removed most of the development time.

That is, 1 thread pool:

Splitting the problem by logical step (or layer) did not make the solution cleaner. It made it much worse. The logical break lead to 2 additional queues, a total of six sets of semaphores (if I’m guessing, it was over 20 years ago), and it would have made that 3.5 days take probably a few hours.

In modern java, if you want something similar, use a ThreadExecutor. If you split the problem by complete unit of work, then you can pre-load the queue, give it a number of threads, and it will complete the work and then easily shut down. I did this work last century, just a few years after google existed, so we’ve gotten better.

If you are using Spring, you can use @EnableAsync on your @Configuration, and @Async on methods to have them do their work after an immediate return.

Conclusion

Don’t use threading.

If you do, keep it simple.

Use modern approaches over hand rolling solutions.

Published 22 February 2021

" Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.