Perform machine learning on JSON in S3 with DataRobot

DataRobot's user interface showing a bar and line graph and asking questions such as "What would you like to predict" and "What is the primary date/time feature". In the center of the interface is a large "Start" button.

With REFORM and DataRobot you can start training machine learning models on JSON in S3 in minutes.

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.

JSON is not a normal tabular data format. Unlike tabular data the structure of each piece of JSON is tailored to a specific purpose. For example a piece of JSON for a form about you and your pets would have a very different structure to a piece of JSON for a manufacturing dashboard. Machine learning models expect standard tabular data just as spreadsheets and charts do. In order to use JSON to train our models and make predictions we need to transform it into meaningful tables.

A screenshot of REFORM. The structure of a JSON dataset is presented as a filesystem which is being browsed and from which relational columns are being picked

REFORM lets you access as tables in DataRobot for training and prediction. Simply provide the details of the buckets, browse even the most complex data as if it were a file browser and pick what you're interested in. Then simply paste the table's access link into DataRobot. REFORM magically transforms the latest data into a mathematically correct analytic ready table and feeds this into your models for training and prediction.

DataRobot's "Import from URL" form showing the URL

REFORM also supports use cases where data needs additional JOINs or GROUP BYs before being used for machine learning. REFORM will transform your data into tables in AWS Redshift, Snowflake, AWS Athena and MS SQL Server which support JOINs and GROUP BYs and can be used as data sources by DataRobot.