Data sources, like the recently released linked open European Patent data from the European patent office, are rich sources of data becoming available for potential machine learning and AI applications. This type of patent data is a good example of labelled training data that can be used to train an algorithm to carry out a particular task, so that even when it is presented with an example that it has not seen before, it is able to perform well on the task. Because the patent data is linked it has more information about potential "labels".
Suppose the task is to predict whether a patent application will receive a particular type of objection from the patent office. In this case the examples are patent applications and the labels are the types of objections raised against the patent applications by the patent office. An algorithm could be trained using examples of patent applications, the documents cited against those patent applications by the patent office, and the information about what types of objections were raised by the patent office. All this data is available on public databases at present. Once the algorithm has been trained, it can then be used to predict whether a new patent application will receive a certain type of objection or not from the patent office.
The algorithm would need to search for prior art documents itself from the public database and this is something that could be done using a rule based algorithm which extracts keywords from the patent specification and uses them to search for documents in the database. This type of tool would enable applicants to gain a good idea of whether a given invention is likely to be patentable or not. An example of a machine learning classifier that has already been created in this type of field, is the classifier which predicts whether a US patent application will receive an Alice rejection or not http://illinoisjltp.com/journal/wp-content/uploads/2017/09/Dugan.pdf
Finding available sources of labelled training data is not easy. The linked open EP data is a relatively rare source of labelled training data which is readily available in large quantities.
Linked open EP data
"Artificial Intelligence (AI) is likely to have a profound impact in the healthcare sector in coming years. The Academy of Medical Science has indicated that “the impact of artificial intelligence on…the healthcare system is likely to be profound” because new methods of healthcare delivery will become possible, clinical decision-making will be more informed, research and development will become more efficient, and patients will be more informed in managing their health. In the life sciences sector there are similar claims, such as that AI may soon find a cure for cancer, that it can be used to speed up drug discovery, and that it can be used to find cures for rare diseases."
The content above was originally posted on CMS DigitalBytes - CMS lawyers sharing comment and commentary on all things tech.