IMPORTING DATA

10 Advanced CSV Import Features You (Probably) Won’t Launch Yourself

CSV import may on the surface seem simple to build, but launching an advanced importer is a huge undertaking requiring substantial product and engineering resources.

Christina Gilbert

Christina is the co-founder & CEO at OneSchema.

September 27, 2022

Introduction

While CSV import may on the surface seem simple to build, there’s a long list of features your customers will request when they get stuck on imports. We frequently see startups think CSV import will be launched after a quick sprint, but get bogged down by the never-ending critical feature requests.

“Two years after we started building CSV import, we prioritized our 4th engineering project to add improvements” said Rohan Sahai, Director of Engineering at Affinity. “When we brought the proposal to leadership, they insisted this be the last time they ever see CSV import on the roadmap. If there are two features we regret homerolling, the first is subscription billing and the second is CSV import.”

For rapidly growing startups with many priorities, the never-ending CSV import project takes away valuable engineering time that could be spent focusing on the core product. From CSV parsing bugs to intuitive error messaging, here are the top 10 features that will keep your engineering team working on CSV import for months (or even years) longer than expected.

1. Data Validation & Autofixes

From enforcing basic type validations to complex conditional logic, there are endless ways to validate the data you allow into your system. You’ll likely implement the basics, but rogue whitespace, emojis, or missing leading zeros create bugs and customer complaints down the line. 

An example of OneSchema's LAST_NAME data validation error message

At OneSchema, we’ve compiled the largest library of data validations available, designed for all industries. To implement CSV import, your team will simply need to leverage our no-code data types library to get started. The alternative is to write custom validations for each data type you’ll support now, and continue to add to the list indefinitely in the future for all new types.

Even after implementing robust validations for each data type you need, an additional layer of logic is needed to convert between formats. At OneSchema, we’ve built the platform to automatically convert between any date, number, country code, phone number format, and more. OneSchema can even intelligently infer the original format of a column of data when ambiguous (such as MM/DD/YYYY dates vs DD/MM/YYYY dates). 

We’re continuously testing our validators against the world’s messiest CSV data across our entire customer base. There are constantly new edge cases we catch as part of this maintenance process – trust us when we say the possible list of errors to validate against is never-ending! 

2. Advanced Parsing

CSV parsing will be an endless source of bugs, even if you leverage a robust parsing library. Most CSV parsers do not handle different encodings gracefully out of the box and in most cases, your customer will not be familiar with the encoding of their exported CSV file. 

To ensure that every CSV file imports properly, OneSchema has byte order mark (or BOM detection) to intelligently switch between parsing strategies for UTF-8, UTF-16, and other Unicode encodings. Also, thinking about supporting Excel? Get ready to gracefully handle legacy Excel formats and implement different parsing strategies for .xls and .xlsx. 

In addition, most open-source libraries have incomplete support for handling edge-cases like:

  • Jagged CSVs
  • Empty columns
  • Empty row
  • Non-comma delimiters (tab, pipe)
  • Commas inside cells
  • Newlines inside cells
  • Empty space inside cells
  • Missing header rows or data above the header row

Failures in the parsing step to handle these edge cases almost always make data import impossible, as all future steps rely on a successful parse.

OneSchema gracefully handles these edge cases and more. For example, the platform intelligently identifies the row most likely to be the header row to remove as much friction as possible throughout the mapping step.  

3. Exportable Excel Error Summaries

OneSchema's Excel export maintains annotations to allow for easy collaboration

When your customer’s upload inevitably has errors that need to be corrected, it’s likely they’ll need to collaborate with other members of their team to resolve the issues. A naive importer will display a list of cryptic error messages. A more advanced implementation might provide clean and helpful error messages, but with an inadequate error summary, the back-and-forth to complete the import still becomes incredibly difficult.

To provide an exceptionally smooth import experience, OneSchema goes a step further and allows users to export an Excel file with highlights and annotations that explain the issues to your customer (video).

4. In-line error resolution

The #1 challenge we see with custom-built CSV importers is the lack of an error resolution step. Users trying to map their data to the schema are left with cryptic error messages (or in many cases, no communication at all) as to why their file didn’t upload. 

Example of a confusing error message for customers to parse

This can create an incredibly frustrating experience for customers, as they’ll either have to iterate on getting the data correct in Excel (which will often reformat their data and create more errors) or escalate a support ticket to your team.

We’ve found that adding the below features drastically reduces customer frustration:

  • Displaying clear errors 
  • Filtered views for errors by row
  • Letting the customer correct errors in-line
  • Allowing for bulk data correction options (such as find-and-replace, deleting rows, CTRL+Z for undo)
  • Automated format fixes (like date, phone number, and country-code conversions)
  • Exporting highlighted error files to Excel (see #5 below)
  • Deleting rows with errors

While building a robust error resolution process can be a daunting task from both an engineering and design perspective, you’ll save your customer the round-trip to Excel to clean up their data and drastically improve your onboarding conversion rates. If your CSV import flow is part of customer onboarding, launching with an understandable error resolution step is absolutely critical or you’ll see a substantial drop-off in activation rates. Across a survey of our customer base, we’ve seen a 50+% increase in completed activations with the addition of the error resolution step. 

5. Performant Imports for 1M+ row files

Performance is a huge concern with data imports. Customer spreadsheets can vary drastically in the number of rows and columns, and every additional second of load time can hugely impact conversion rates. Naive approaches will start with loading, validating, and visualizing all of the spreadsheet data at once on the browser. 

As spreadsheets get to thousands or millions of rows, however, the complexity scales drastically:

  • Validations need to be done in parallel batches, especially if the results are to be displayed in a responsive UI. 
  • Bulk edits like find-and-replace are tricky to perform, especially if they trigger downstream validations that need a round trip to the database or a third-party data provider like Google Maps. 
  • Visualizing all the spreadsheet data on the front-end also necessitates caching and sampling in order to not keep renders fast and instantaneous, especially while scrolling.

OneSchema is the only available product that supports 1M+ row files in a performant way. Our backend uses an in-memory Rust data service that runs on high-memory servers to load each uploaded spreadsheet into memory. This in-memory approach ensures that all of our data type validations and transformations happen in real-time, regardless of the spreadsheet size. 

When the spreadsheet data is ready to import, OneSchema is resilient to outages, as our asynchronous jobs send the spreadsheet data in batches of JSON records to your importer webhook endpoint with exponential backoff on retries. Finally, our embedded spreadsheet frontend virtualizes rows as the user scrolls so the interface remains fast and snappy whether the spreadsheet has hundreds of rows or millions of rows.

6. Intelligent Mapping

An example of OneSchema's column mapping preview pane

Column mapping is another common stage where CSV imports go awry. If your customer maps their columns incorrectly, they’ll either need to contact your team to help undo imports (resulting in a time-consuming task for your engineering team) or get frustrated and abandon the process. Implementing a preview pane to let your customers see the mappings (and errors) will help your customers get their data right the first time. 

If your imports have a large number of columns, it’s also slow and tedious for your customer to map the columns correctly. Supporting intelligent fuzzy matching or remembering mappings will give your customer a far smoother import. Our intelligent mapping algorithm has been trained on millions of rows of mapping data to recommend the correct match, every time.

OneSchema also allows you to surface helpful column descriptions and sample data to your customers. Allowing your customers to split and merge columns like first name, last name, and address can save your customers valuable time during the import process. 

7. Custom Columns

For highly customizable systems (for example, Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems), you’ll often allow your users to add custom columns within the platform. To support this, you’ll need to be able to initialize a new importer schema for each customer with dynamic validations based on the customer’s configuration.

You may even want to allow customers to create new columns in your system during import. Implementing a UI to empower your customers to add custom columns in a straightforward way during import can be a complex design and engineering project. The feature becomes even more complex if you want to validate data in the newly created columns during import (for example, validate that the column is a NUMBER or COUNTRY CODE type). 

OneSchema is the only platform that allows you to validate custom columns by leveraging our large data type validation library

8. Upload Report Dashboard

Report dashboards can help internal teams troubleshoot CSV import issues more efficiently

Every customer onboarding flow needs good reporting and analytics. Given the error-prone nature of doing a data import, many companies point to their data import workflow as the largest source of onboarding support tickets and bug reports.

In order to properly support these customers, customer success and support teams will frequently ask for dashboards to empower the team in seeing what went wrong with an upload. An upload report dashboard enables a support team member to quickly look at the historical list of imports, download the original CSV file, and compare it against the uploaded data to diagnose the data issue at the root of any customer ticket.

Even worse than a support ticket is when bad data makes it into the system. In those cases, customers may request a data import “undo”. In the most severe cases, this can force your team to create a fresh instance with all previously stored data deleted to solve the issue. Showing upload reports greatly improves your internal teams’ ability to help customers understand their own data and enables better scalability of user onboarding.

9. Errors vs warnings

Depending on your use case, you may want to treat errors and warnings differently during the upload experience. This will require your team to implement custom logic for handling “error” cases vs “warnings” cases. For example, you may want to allow files that actively contain warnings to successfully upload, but not errors.

Flagging only true errors as errors will save your customer a lot of headaches of getting their data to be perfect before being able to upload any data.

10. Self-Validating Excel Templates

Even with a state-of-the-art importer, it’s often helpful for your customer to initially configure their in an Excel spreadsheet. In Excel, however, your customer won’t have any context on your requirements for their imported data (and the errors that will be triggered when they try to import data into your system). 

OneSchema enables you to create a self-validating Excel template based on your data model in OneSchema (3 minute video). You can keep it synced with your importer template and allow your customers to download the template from your product’s UI. This makes it even more likely your customer will have a smooth and successful import.  

Conclusion

In an ideal world, time and cost aren’t an issue. Product managers, designers, and developers would be able to prioritize every advanced CSV import feature. However, launching an advanced CSV importer is a huge undertaking requiring substantial product and engineering resources. The unfortunate reality is that product teams skip key features, only to be forced to add them back to the roadmap after an onslaught of customer feedback. Months or even years later, they’re still making improvements to CSV import. 

If you’re ready to make data imports effortless and efficient, we’d love to talk! Request a demo of OneSchema here or email us at sales@oneschema.co.

Continue reading

Make data imports effortless and efficient

Ready to get started?

Request a Demo