<center> Learning COBOL: A Journey for the Modern Programmer === *Written by [JDC](https://twitter.com/jdcaballerov). Published 2021-04-06 on the [Monadical blog](https://monadical.com/blog.html).* </center> Last year, as a result of the pandemic, many legacy systems went over their capacity. [Several](https://josephsteinberg.com/covid-19-response-new-jersey-urgently-needs-cobol-programmers-yes-you-read-that-correctly/) [news](https://www.techrepublic.com/article/cobol-programmers-are-in-demand-to-fight-the-coronavirus-pandemic/) [outlets](https://www.theverge.com/2020/4/14/21219561/coronavirus-pandemic-unemployment-systems-cobol-legacy-software-infrastructure) reported that those legacy systems, programmed in COBOL, required urgent action. However, COBOL programmers are scarce and it doesn’t look like younger programmers are taking up the language (the average age of a COBOL programmer is around 60 years old). The news coverage included a plea for more COBOL programmers. I first remember hearing about COBOL from my father when he proudly told me that he had been taught languages for scientific programming (FORTRAN IV) and business programming (COBOL) at university. After speaking with him, I'd searched for resources out of curiosity, but I never managed to get anywhere. This time around, I decided to stick it out and try to learn COBOL more systematically. In this post, I’ll share some resources and ideas that I've found useful while learning COBOL. Following these resources might not be enough to make you into a COBOL programmer, but it will give you enough of an understanding of the language to begin your own journey. Disclaimer: I am not an expert COBOL programmer and welcome any corrections and useful additions to this post. <center> ![COLBOL report](https://docs.monadical.com/uploads/upload_55affe03315621140f87f9cb2ebdb18b.jpg) </center> ## Where to start? This is one of the hardest things about learning COBOL: most of the resources are either historical accounts or technical reference manuals. It feels like expressing a desire to learn Latin and having someone throw you an English-Latin dictionary, or recommend that you read the history of Ancient Rome. Strangely enough, as I’ll explain, you do need to know a bit of history to understand COBOL. ### COmmon Business Oriented Language (COBOL) The idea of having a business language sounds pretty odd to most of us now--isn't business software just programmed in whatever language you want? This wasn’t so in the early days. Back then, languages worked like templates for accomplishing concrete domain tasks, operating at a level just a bit higher than the machine instructions. There were basically two domains: scientific and business. COBOL is a [domain specific language](https://martinfowler.com/bliki/DomainSpecificLanguage.html) for business. Business domain tasks usually require managing and manipulating heterogeneous data (with several sources, formats, and field types) in record structures, using true [fixed point decimal arithmetic](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) for accurate financial accounting to the last digit and fast access to externally stored record structures. These are the kind of tasks where COBOL still excels. Before you invest your time in learning COBOL, it’s important to understand that it’s a different beast to what most programmers are used to. Being a business domain specific language means that the kind of tasks handled by COBOL are closer to those of SQL than Java. You will not be programming kernels or webapps in COBOL--you’ll mostly be using it to operate on records, as with SQL. Understanding this should give you an idea of the kind of tasks that you can do with COBOL. ![mainframes](https://docs.monadical.com/uploads/upload_314c28c4a35d8e8489b17e0bf7bb24bc.jpg) COBOL is a powerhorse for processing records and usually runs on [mainframes](https://www.youtube.com/watch?v=ximv-PwAKnc), machines optimized for reading and writing to external storage. Maybe you’ve heard of supercomputers simulating the Big Bang by crunching millions of numbers in [CPU-bound tasks](https://en.wikipedia.org/wiki/CPU-bound). Well, you can think of mainframes as being the equivalent of that for processing and writing records in [I/O-bound tasks](https://en.wikipedia.org/wiki/I/O_bound). COBOL was also designed with readability in mind, since non-technical users may need to read it too. For example, an employee from the IRS might need to read code that implements a new tax law. Because of this, COBOL uses plain English words. One final caveat: learning COBOL is just the first step. It’s not like learning Python or Ruby--if you really want to work in COBOL, you’ll likely need to learn how to operate a mainframe. ### Program Structure Most COBOL resources will use an encyclopedic approach. The first thing that you’ll encounter will likely be a detailed presentation of program structure and columns. The importance of this concept tends to be overemphasized and, since it’s usually presented in a needlessly confusing way, it can be enough to put you off before you’ve even started! The usual phrasing is something like "COBOL programs are organized into divisions, sections, paragraphs, sentences, statements and characters with strict rules about columns" _(and this is when you usually close the browser tab)_. Here’s what you need to know: COBOL cares about what column your code is in. Most compilers expect your code to follow strict rules for indentation, but modern COBOL also accepts a free format. (If a modern editor can format LISP code, why not enforce simple indentation rules for a COBOL program?!). I found this image very useful: ![COBOL](https://docs.monadical.com/uploads/upload_fe83b6e5383060db9886723bccce9ba0.png) *From https://devops.com/the-beauty-of-the-cobol-programming-language-v2/* Does this mean that you have to learn all those rules before you can go on? Do you really have to enumerate all of the lines? Well, let’s see what this looks like in a modern editor. ![COBOL](https://docs.monadical.com/uploads/upload_4792a382fe16ece63202a836f091c3e4.png) That's it! If the resources just showed you this, you’d happily skip over it. Certain code follows indentation rules and a modern editor will save you the hassle. “So why have it in the first place?”, you may ask. Very few resources explain this, but having the code in strict positions was what allowed old computers to read the instructions on punched cards. On punched cards, a certain COBOL statement, let's say an IF statement, is mapped to a pattern of holes. If you know that a statement starts at a certain column (12-72 in COBOL), it is easier for the hardware to read (this was done by passing a light through the card). ![COBOL source card](https://docs.monadical.com/uploads/upload_1bd5f6fd7323da8e6f2de0ecb656f1c8.png) Don’t worry if this is still confusing--the next section should clear things up! ### Divisions, sections, paragraphs, sentences, statements and characters. WUT? COBOL programs are organized into divisions, sections, paragraphs, sentences, statements and characters. So, what does it all mean? The first thing you need to know is that COBOL started as an effort to create a standard common business language and was sponsored by the US Department of Defense. This means that descriptions are aimed at creating standards (yes, those fat books). Since the 50s, COBOL has been supervised by CODASYL, ANSI, and ISO, and all this bureaucracy has made for a ton of documented details. To understand what it all means and why it was needed, we need to look at the requirements of the old hardware. Before your computer could jump from task to task, it had to follow an execution cycle at the hardware level. An old computer was similar to a mechanical loom: once it was started up, it followed strict steps. For example, let’s say one of these machines is started up; it then turns on a light bulb to detect if a card is present. If there’s a card present, it turns on lights from column 8-11, reads the program's name, allocates space for the variables needed, finds out if it needs to read from an external storage, and only then allocates resources and executes what we would today call the code. This is what most divisions and sections are all about: declaring to the machine what resources (variables, record structures, external files, mechanism to pass information to a caller, etc.) will be needed to execute the program. These days, we take access to request resources at any place in the code for granted. Doing it at the start not only allowed older hardware to follow the execution cycle, but brought benefits to scheduling and software readability and maintainability. These benefits were originally born out of necessity but the same principle is still in use today.[^1] Requesting resources at the start of execution is like knowing how many people will come to your party ahead of time: you can set aside enough food for all your guests without having to run to the store as more people turn up. ## Running Hello World COBOL code generally runs on mainframes, but it can also run on your laptop. To compile it, first install a COBOL compiler. [GnuCOBOL]( https://en.wikipedia.org/wiki/GnuCOBOL) is a GPL licensed compiler. To install on Debian or RPM based Linux, run: ```bash sudo apt install open-cobol # or sudo yum install open-cobol ``` Now create a new file: `helloworld.cbl` ```bash IDENTIFICATION DIVISION. PROGRAM-ID. HELLO. PROCEDURE DIVISION. DISPLAY "Hello, world". END PROGRAM HELLO. ``` As you can see, we are not coding with strict columns. We will instruct the compiler to use a free mode. Notice, however, that we have used an IDENTIFICATION `DIVISION` and a PROCEDURE `DIVISON`. To compile it: ```bash cobc -free -x -o helloworld hello.cbl # -free mode # -x produce an executable # -o name the output helloworld ``` And to run it: ```bash ./helloworld Hello, world ``` That's it, you've just run your first COBOL program! ## The Aha Moment If you have reached this point, you can start learning COBOL just like you would any other language. Unfortunately, most of the content is either encyclopedic, a video with crappy sound and a dark minuscule screen, or marketing content from IBM (the largest mainframe vendor). IBM's websites are a mess: you’ll be redirected all over the place and asked to log in for something only to meet a dead end or find that the login doesn’t work. Fortunately, [Derek Banas](http://www.newthinktank.com/2020/04/learn-cobol-one-video/) has made [this video](https://www.youtube.com/watch?v=TBs7HXI76yU). I think it’s the single best resource on COBOL accessible to a general public (it can now be found on some of IBM's websites). If you only watch one video, let it be this one: <center> <iframe width="560" height="315" src="https://www.youtube.com/embed/TBs7HXI76yU" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </center> ## COBOL Shines ### Defining Records COBOL can read and write data to external files. Complex nested, typed records can be declared for read/write or processing using language built-in features. For example, we can declare a customer record that contains an identification with three numbers, a customer name alphanumeric field of twenty characters, and a date of birth composed of two numbers for the month (MOB), two numbers for the day (DOB), and four numbers for the year (YOB) as follows: ``` 01 Customer. 02 Ident PIC 9(3). 02 CustName PIC X(20). 02 DateOfBirth. 03 MOB PIC 99. 03 DOB PIC 99. 03 YOB PIC 9(4). ``` In modern languages this might require importing packages and using validators. ### Fixed Point Arithmetic To appreciate one of COBOL's features that still beats modern languages out-of-the-box, let's use a [neat example from Marianne Bellotti](https://medium.com/the-technical-archaeologist/is-cobol-holding-you-hostage-with-math-5498c0eb428b). Marianne makes use of [Muller's recurrence](https://latkin.org/blog/2014/11/22/mullers-recurrence-roundoff-gone-wrong/), a recurrence designed to exploit rounding errors in floating point arithmetic. This simple recurrence makes floating point arithmetic go bonkers after a few iterations: <center> ![function](https://docs.monadical.com/uploads/upload_6c6d8e8dd00d652c1d3d878e88afe2bf.png) </center> This is what happens when you execute a normal floating point implementation (floating_point) and compare it to a careful implementation using fixed point arithmetic (fixed_point): ``` i | floating point | fixed point -- | -------------- | --------------------------- 0 | 4 | 4 1 | 4.25 | 4.25 2 | 4.47058823529 | 4.4705882352941176470588235 3 | 4.64473684211 | 4.6447368421052631578947362 4 | 4.77053824363 | 4.7705382436260623229461618 5 | 4.85570071257 | 4.8557007125890736342039857 6 | 4.91084749866 | 4.9108474990827932004342938 7 | 4.94553739553 | 4.9455374041239167246519529 8 | 4.96696240804 | 4.9669625817627005962571288 9 | 4.98004220429 | 4.9800457013556311118526582 10 | 4.9879092328 | 4.9879794484783912679439415 11 | 4.99136264131 | 4.9927702880620482067468253 ===| ==== FAIL ==== | =========================== 12 | 4.96745509555 | 4.9956558915062356478184985 13 | 4.42969049831 | 4.9973912683733697540253088 14 | -7.81723657846 | 4.9984339437852482376781601 15 | 168.939167671 | 4.9990600687785413938424188 16 | 102.039963152 | 4.9994358732880376990501184 17 | 100.099947516 | 4.9996602467866575821700634 18 | 100.004992041 | 4.9997713526716167817979714 19 | 100.000249579 | 4.9993671517118171375788238 ``` A normal Python implementation rounding error will be noticeable at the twelfth iteration (imagine this being a money operation). A better implementation will require careful use of the Decimal package in Python. Marianne's implementations are as follows: ```python from decimal import Decimal def rec(y, z): return 108 - ((815-1500/z)/y) def floating_point(N): x = [4, 4.25] for i in range(2, N+1): x.append(rec(x[i-1], x[i-2])) return x def fixed_point(N): x = [Decimal(4), Decimal(17)/Decimal(4)] for i in range(2, N+1): x.append(rec(x[i-1], x[i-2])) return x N = 20 flt = floating_point(N) fxd = fixed_point(N) for i in range(N): print str(i) + ' | '+str(flt[i])+' | '+str(fxd[i]) ``` As you can see, implementing fixed point arithmetic requires using a package: `Decimal`. For this kind of thing, [Martin Fowler argues](https://martinfowler.com/eaaCatalog/money.html) that money should be a first class data type in computer languages. So how is this different in COBOL? ``` 01|004.47058823529411764705882352941176471 02|004.64473684210526315789473684210526326 03|004.77053824362606232294617563739376991 04|004.85570071258907363420427553444185141 05|004.91084749908279320044025926378962615 06|004.94553740412391672477338380318699621 07|004.96696258176270059871193848764903330 08|004.98004570135563116126860800216481031 09|004.98797944847839226014013305208354003 10|004.99277028806206809748959571792087271 11|004.99565589150663402662409174328465676 12|004.99739126838134411289513973682802581 13|004.99843394394481691903932446956681264 14|004.99906007197089386832554141324799260 15|004.99943593714683915817375969717687204 16|004.99966152410376774132754287247512323 17|004.99979690071342198374588253855244026 18|004.99987813547801267412736521353260176 19|004.99992687950622844123571454299609864 20|004.99995612709372893980120392028918176 21|004.99997367665714207391027589257889309 22|004.99998421854893244207517083666149226 23|004.99999078385622746075942176917258424 24|004.99999952544795364009475753214537754 25|005.00010081816653406051543167698525302 26|005.00208250686479366406101602817180647 27|005.04167249116039245891463937773442947 28|005.82658447276990738127550632450594642 29|019.18644465241775221538311090096032912 30|078.93993662847693666623797354287682536 ``` COBOL's implementation will also break, but it allows us to have control over the decimal places, since a fixed point is built in (I used 35 places here). This was also executed without an import. For a deeper discussion of this point, I recommend [Marianne Bellotti's article](https://medium.com/the-technical-archaeologist/is-cobol-holding-you-hostage-with-math-5498c0eb428b). `muller.cbl` ```bash IDENTIFICATION DIVISION. PROGRAM-ID. muller. AUTHOR. Marianne Bellotti. DATA DIVISION. WORKING-STORAGE SECTION. 01 X1 PIC 9(3)V9(35) VALUE 4.25. 01 X2 PIC 9(3)V9(35) VALUE 4. 01 N PIC 9(2) VALUE 20. 01 Y PIC 9(3)V9(35) VALUE ZEROS. 01 I PIC 9(2) VALUES ZEROS. PROCEDURE DIVISION. PERFORM N TIMES ADD 1 TO I DIVIDE X2 INTO 1500 GIVING Y SUBTRACT Y FROM 815 GIVING Y DIVIDE X1 INTO Y MOVE X1 TO X2 SUBTRACT Y FROM 108 GIVING X1 DISPLAY I'|'X1 END-PERFORM. STOP RUN. ``` To compile and run: ``` cobc -free -x -o muller muller.cbl ./muller ``` ## Suggested Personal Project If you’ve made it this far, it’s a good idea to find a personal project to practice your new language. For this, I suggest implementing a [personal accounting system using plain text](https://plaintextaccounting.org/). [This rust implementation](https://github.com/ebcrowder/rust_ledger/blob/main/src/ledger.rs) is a good candidate to port with records clearly defined, and is the kind of task where COBOL will excel. ## Conclusion COBOL is infamous for being difficult to learn, but I’ve found that the real obstacle here is not the language itself but finding the right resources and getting a proper understanding of its use case. While COBOL is not as inaccessible as its reputation would suggest, it’s important to understand that learning COBOL alone won’t get you one of those sweet gigs working at a big bank. For that, you’d also need to learn a bunch of stuff about mainframes (e.g., how to run a job, interface with current technologies, and access external storage). You’d also need to learn how to navigate legacy code and how to structure your program so that it doesn’t become spaghetti code. I had a lot of fun learning COBOL and feel like I can finally understand how the language works and why it’s useful. I won’t be able to use it to rewrite my kernel, but my accounting will be accurate to 35 decimal places! ## Resources ### Basic (I recommend reading/viewing these in this order) - https://www.youtube.com/watch?v=ximv-PwAKnc - https://stackoverflow.blog/2020/04/20/brush-up-your-cobol-why-is-a-60-year-old-language-suddenly-in-demand/ - https://www.youtube.com/watch?v=ZOLB4KqHmBs - https://devops.com/the-beauty-of-the-cobol-programming-language-v2/ - http://www.newthinktank.com/2020/04/learn-cobol-one-video/ ### Tutorial Language References - https://www.mainframestechhelp.com/tutorials/cobol/cobol-divisions.htm - http://www.academictutorials.com/cobol/ - https://www.tutorialspoint.com/cobol/index.htm ### Courses - https://www.coursera.org/learn/cobol-programming-vscode#instructors - https://github.com/openmainframeproject/cobol-programming-course ### Mainframe Emulator - http://www.hercules-390.eu/ ### YouTube Channels and Videos - https://www.youtube.com/watch?v=RdMAEdGvtLA&t=3802s - https://www.youtube.com/channel/UC116jmlZqU3oib4_da75QEg - https://www.youtube.com/channel/UCR1ajTWGiUtiAv8X-hpBY7w [^1]: For example, in C, in many compilers you are required to declare your variables at the start of each function--this allows the compiler to allocate resources effectively. --- <center> <img src="https://monadical.com/static/logo-black.png" style="height: 80px"/><br/> Monadical.com | Full-Stack Consultancy *We build software that outlasts us* </center>
- A pattern for strategy backtracking using Python generators
- How to mint NFTs using Solana’s mobile wallet adapter
- How to build a modular arithmetic library in Python
- Ramping up on Solana Phone: Crypto for mobile
- View more posts...
Back to top