Reload

Jumbune

Accelerated Hadoop based Analytics
  • english
  • french
  • italian

Contact us

Contact us for more information about Jumbune, professional services and product support.

Send Message

Analyze Data

Data quality, completeness and accuracy are important factors that for an analytical system. Analytics over erroreonus bad data can lead to imperfect business decision which may result in monetary loss to the organization. Quality of data is very critical factor and organisations must be able to trust the data residing in their data lake. Usually Data Scientist spends 60% of their efforts in cleaning the data so that they can create effective data science models over it.

Jumbune Data Analysis framework provides users much needed visibility into the quality and profiles of the data present on the Hadoop distributed file system. Users can assess the quality of the data within th e dataset over a period of time for consistancy and logic, also profile them to quickly categorise them based on the present data. The user need not write any code and entire functionality is carried out without any data movement.

Data Validation and quality time lines

Gain deep insights into the quality of the data present on your Data Lakes. HDFS data validation is a generic data validation framework that checks the data for anomalies and reports them with various details like line number, file name etc. It validates the data on the DFS based on custom defined set of rules as per the business. The rules can be in the form of null checks, data types and/or regular expressions expressing the business form of data. The data validation tasks can be scheduled and Data Quality Timelines are used to infer the health of the data over a period of time.

It supports various data formats such as plain text, JSON and XML documents as well.

Data Profiling

HDFS data can also be profiled over a set of rules or without one to obtain some quick insights over the ingested data using the Data Profiling tool. The profiles can be used later for getting a high level view of the data values.

image
About Us
Reload

Jumbune, a product from Impetus Technologies, that helps to optimise and analyze Big Data applications running on enterprise clusters. It is built on open source and highly scalable with deep insights into performance of Hadoop applications and clusters.

‚Äč