Natural Language Processing and Scalable Architecture

There are many companies state they do natural language processing (NLP) for electronic medical records (EMRs). However, I see two major problems with this claim.

Most companies aren’t doing NLP

This is only a problem if the company isn’t delivering on the promise of NLP. The technology is used to look through human-generated free-text content. This blog is an example of this. An Excel spreadsheet, is not free-text content. Using NLP, a data scientist should be able to derive meaningful information based on user requirements. For example, find all patients with a history of mental illness on the paternal side of the family. Finding this information can only be found by looking at free-text because most family histories are only captured in clinician notes. This is a trivial problem using NLP.

There are other techniques which can be used to solve the problem above. Text hunting using regular expressions is a common one but is not NLP.

Most companies can’t handle NLP for large datasets

Serious processing power is required to handle large NLP tasks. I have developed a rapidly scalable platform which allows NLP processing on terabytes of data. This enables analysis of patient populations over several years quickly (ie. hours). From this analysis, one can assess patient populations and drilldown to the component contributions.

The merge of NLP technology with a scalable architecture provides an excellent tool for big data analysis.