Excel: The Crutch That Broke Our Legs

Businesses everywhere rely on Microsoft Excel.  Excel is versatile to a fault, and it is very approachable.  In a way, it is the original low code platform (if you ignore the fact that it is a ripoff of Lotus 1-2-3).  Thanks to Excel’s versatility and approachability, it is relied on heavily for everything it does well, and it is also relied on heavily for an ocean of things that it was never intended to do, or does poorly.  When the only tool in your tool box is Excel, every problem looks like a spreadsheet.  As a result, businesses everywhere have hobbled along with Excel, carefully protecting a quivering mass of crappy solutions implemented in the form of spreadsheets more fragile than a house of cards, never realizing that the crutch they relied on to move forward was the very thing holding them back.

As a disclaimer, I do not intend to criticize Excel.  I do not believe that Excel is to blame for most of the problems I will present.  In fact, the businesses who created these problems should hold the blame.  My hope is that my own experiences will show others how Excel can be misused so they can consider alternatives with less painful outcomes.

Requirements Gathering and Desktop Publishing with Excel

Example requirements gathering worksheet

Actual requirements gathering spreadsheet from nyc.gov.

 

Early in my professional life, I was tasked with the responsibility of managing product change requests for a particular client.  The change requests were documented in Excel.  Let that sink in.  Text documents in Excel.  As I sat at my desk entering our response to their change request, I could not help but wonder why anyone would ever use Excel for creating a document like this.  I asked the client.  It turned out, my company had asked them to submit their change requests this way.  The reason was because Excel was being used as a text layout tool.

How did this hold us back?  It took a surprising amount of time to edit these documents in Excel.  When the enter key is pressed in a text editor, the cursor is moved to the next line.  When the enter key is pressed in Excel, the cursor is moved to the next cell, which isn’t what you want if you are meant to add another paragraph in the same cell.  After typing in a cell, the cell needs to be resized to display all of the text.  I could go on and on.

What would have been better?  A few emails and a well crafted statement of work made in Word would have been much more manageable.  Which is what we did, eventually.  There are also a slew of dedicated requirements gathering and management applications to choose from.

ExcelDB™

Not to be confused with the actual application for database enabled spreadsheets called Excel-DB, ExcelDB is the most common misuse of Excel, which is using Excel as a database.  Excel is a database client.  Newer versions can connect to SQL Server.  I suppose you can say that Excel is a database that can have about one million rows and around sixteen thousand columns.  Without considering the total amount of data that can be stored per tuple, that technically, that computes to more individual tuples (cells) than what can be supported in Cassandra.  But, both applications are probably the wrong choice for the type of database I am referring to.

My first real job was in a machine shop as a CNC lathe operator making portable drill string components for an ex-convict who also happened to be an ex-geologist.  Ok, so I didn’t actually work for the ex-con, but our shop did quite a bit of business with him.  Our shop also had a large quantity of tools that needed to be kept track of.  These were things like metal cutting inserts and bits that were small and application specific.  All of these were tracked in an Excel file that served via Windows Server.  Actually, we had two copies of the file.  One for viewing and one for editing.  This was to prevent loss of data.  We were using an antiquated version of Excel that didn’t support shared worksheets, if someone viewed the file while the shop owner was making changes, the changes were lost.  The file contained tool names, part numbers, quantities, storage locations, and a number of other data points I can no longer recall.

How did this hold us back?  The information was difficult to maintain.  It was one person’s job to keep the tool list up to date.  It also took quite a bit of effort to put the data together.  The data only captured part of the picture.  There were lost opportunities to create well defined relationships between machines, jobs, and the tooling they required that could have been used to avoid remaking tooling or reinventing job setup.

What would have been better?  FileMaker or Access would have been better for ensuring data integrity alone.  But why reinvent the wheel?  There are plenty of off the shelf tool and inventory management systems like Cribware.

Transforming Data in Excel

I am going to make the claim that Excel is not a great choice for data transformation.  Excel makes some choices for the user that don’t always makes sense.  For example, leading zeros will often times be dropped without prompting the user.  Dates are also often time formatted as numbers, or as date formats that are different than what you started with without prompting or being prompted.  There is a good chance you won’t notice this until later when your data is messed up and your clients are asking why.

I experienced this recently with zip codes in a CSV file that I inspected and created test import files from using Excel.  A few states have zip codes with leading zeros.  Unbeknownst to me, Excel chose to drop those leading zeros when I opened the file to create a one line test file prior to importing all of the data.  Excel also prompted me to save the CSV file before I closed it, which it does by default any time you close a CSV.  I was on autopilot and pressed enter, which saved the mutated values over my existing data.

How did this hold us back?  The error cost me credibility.  If the zip codes are messed up, a reasonable client is going to ask, or at least wonder what else is messed up.

What would have been better?  You might be thinking that I am transitioning into a rant about why Excel sucks.  I am not.  In fact, I blame myself.  I could have opened the CSV file with a text editor to snag the headers and the first line for my test file.

ExcelCRM™ Powered by ExcelDB™

Tabs in a typical ExcelCRM worksheet.

If you are someone’s customer, that someone is keeping at least some data about you.  If that someone is smart, they are keeping a large amount of well organized, reportable data about you.  If that someone is jumping on the big data hype train, they are clinging to every scrap of data they can get about you in the hopes that someday the data will reveal something insightful about you that they can use to make money.  This might sound a little invasive, but businesses use this data to make decisions that make them more effective.  This is also known as Client Relationship Management (CRM).

Back to early in my professional life.  I wore a couple of hats like business analyst and sales engineer.  We had customer information.  All kinds of customer information.  Some of it was stored in a home brew database.  A lot of it, like our sales pipeline, was stored in Excel files.  It was created and owned by one person, and ceased to enjoy regular updates when that person departed from the organization.

How did it hold us back?  Deals that could have been, never were.  Data that could have been used to identify trends were entombed in dusty .xls volumes stored in forgotten directories.  Our CRM was a mausoleum of good intentions trapped in painstakingly formatted worksheets.

What would have been better?  There are plenty of purpose built CRM software solutions to choose from that are better alternatives to Excel spreadsheets and enthusiasm.  The opportunity cost of not effectively capturing and acting upon client data is too great to consider avoiding the cost of purchasing a CRM solution by storing that data with Excel.  Likewise, Excel is a fine choice for analyzing your CRM data, but if that analysis needs to be done more frequently than once, it should be automated.  Many CRM solutions offer customizable reports and dashboards to accomplish this.

Data Transformation, Again

I am going to make another claim that Excel is not a great choice for data transformation.  When saving new document as a CSV, Excel uses the code page for your operating system and locale.  Either win-1252 or MacRoman if you are in North America.  If you are viewing a  CSV file that is encoded in another format like UTF8, you end up with something like this:

Apostrophe’s, “Double quotesâ€, ‘single quotes’, en—dash, em–dash, hyphen (•)…

Instead of:

Apostrophe's, "Double quotes", 'single quotes', en-dash, em-dash, hyphen (-)…

If you inadvertently save your data while closing the file in Excel, you will be stuck with the corrupted encoding shown in the former example.  I suppose that if this did happen, you could try to undo the damage by finding and replacing the corrupted characters one by one.  You might even be able to automate the process with a function like this in JavaScript:

String.prototype.replaceAll = function(search, replacement) {
  var target = this
  return target.split(search).join(replacement)
}

var uncorrupt = function (string) {
  return String(string)
    .replaceAll('‘', '\'')  // left single quote
    .replaceAll('’', '\'')  // right single quote
    .replaceAll('—', '-')   // en dash
    .replaceAll('–', '-')   // em dash
    .replaceAll('•', '-')   // hyphen
    .replaceAll('…', '…')   // ellipsis
    .replaceAll('“', '"')   // left dbl quote
    .replaceAll('â€', '"')    // right dbl quote
}

But it would be better to avoid the problem in the first place.  I encountered this recently when I needed to transform data that had been exported from a legacy system prior to importing that data into a new system.  The data had a mess of problems that needed to be dealt with.  There were columns of data that needed to be added.  There were tuples with escaped strings.  There were tuples with HTML artifacts, and there were tuples with XML artifacts.  Worse of all, after fixing everything else, there was still mis-encoded characters everywhere.  The code above was a solution that I came up with to deal with with encoding problem.

I managed to write some unit tests to prove out my solution, and the unit tests passed.  My bad strings were coming back good as new while I observed the output on of my code on the command line.  But, as soon as I opened the CSV data to view it in Excel, I discovered that the encoding problems remained.  As it turned out, the encoding problem was limited to Excel, and had been the entire time.  However, I was falling back on old habits and transforming the data in Excel, which in turn persisted the mis-encoded data to my file when I saved my changes.  Once I realized this, I found that importing the data to our new system without viewing or transforming the data with Excel first eliminated the encoding problems that had plagued us for ages.

How did this hold us back?  The character encoding problem we faced had been occurring for months.  Many had resorted to copying and pasting from the old system to the new in their browsers.  That should have been a clue right there, but we were all in a hurry to move forward.  Many hours went to copying and pasting data.

What would have been better.  Wanting a better solution, I created a small command line application with Node.js.  If it weren’t for taking this step, I might not have discovered that the character encoding problem existed.  Unfortunately, I burned through six hours of time writing a function and tests to find and replace a problem that never existed.  You could say that I wasted my time by implementing my own solution instead of searching for an off solution as well.  However, my solution only took a few hours to produce, and was only a portion of the overall solution.  We also wrote better queries to produce data in the format that we needed it rather than relying on Excel or other tools to fix our data.  The real loss was the time spent copying and pasting the data from one system to another.

So, What is Excel Good For?

Simple data analysis, quick calculations, and mock-ups of data visualizations, reports, and dashboards.  If the task only needs to be done once or twice, Excel is an excellent choice.  It is easy to use, easy to share, and has some great features like pivot tables and pivot charts.  If you find yourself tempted to use Excel for anything else (formatting documents, for example), please consider alternative solutions.