Welcome Customer !

Membership

Help

New H3C Technology Co., Ltd
Custom manufacturer

Main Products:

ybzhan>Products
Product Categories

New H3C Technology Co., Ltd

  • E-mail

    service@h3c.com

  • Phone

  • Address

    H3C Company, 466 Changhe Road, Binjiang District, Hangzhou City

Contact Now

H3C SeaSQL MPP Distributed Database

NegotiableUpdate on 12/28
Model
Nature of the Manufacturer
Producers
Product Category
Place of Origin
Overview
-In the past decade, the Internet industry has accumulated a lot of information and data through slow to fast development. In the case of explosive growth of data, a new computing method is needed for massive data. The traditional computing methods are no longer sufficient to handle massive data processing, and the drawbacks are obvious. In addition to high cost, the technology is also difficult to meet data computing performance indicators. The Scale up mode of traditional hosts has encountered bottlenecks, and the SMP (Symmetric Multi Processing) architecture is difficult to scale. Moreover, it cannot meet the requirements of massive data computing in terms of CPU computing and IO throughput. In this context, the distributed database H3C SeaSQL MPP, which has super performance analysis capabilities for processing PB level data, has emerged. H3C SeaSQL MPP is based on the Shareless MPP (Large Scale Parallel Processing) architecture, which has good elasticity and linear scalability. It has built-in parallel storage, parallel communication, parallel computing, and optimization technologies, is compatible with SQL standards, and has powerful, efficient, and secure PB level structured, semi-structured, and unstructured data storage, processing, and real-time analysis capabilities. It also supports mixed workloads covering OLTP type businesses, opening up a closed loop of business data insights business for customers. It can be deployed on enterprise bare metal or private clouds, supporting a large number of core production systems in various industries including finance, securities, telecommunications, government, manufacturing, transportation, etc.
Product Details
system architecture

In order to help customers from various industries cope with the challenges brought by the big data era, H3C has created a high-performance big data processing platform, using a computing framework that integrates Hadoop and MPP distributed databases to provide users with a complete big data platform solution, including data collection conversion, storage calculation, analysis and mining, sharing and exchange, BI display, and operation and maintenance management. It helps users build massive data processing systems, discover the intrinsic value of data, and obtain new market opportunities.

SeaSQL MPP supports local physical machine deployment as well as deployment on private clouds, with data sources including local storage HDFS、 Cloud storage or other relational databases such as Oracle, MySQL, etc. These data are integrated into the SeaSQL MPP database through ETL tools or Kafka. The SeaSQL MPP distributed database is based on a shareless large-scale parallel architecture, with PB level data processing capabilities. The kernel is based on the PostgreSQL kernel and has undergone extensive optimization, providing a powerful query optimizer and complete compatibility with SQL. At the interface layer, SeaSQL MPP not only provides standard JDBC/ODBC interfaces, but also interfaces for commonly used programming languages such as Python/R/Java/Perl/C, as well as interfaces for machine learning libraries MADLib, full-text search, and PostGIS, for use layer calls.

SeaSQL MPP is a distributed architecture with strong horizontal scalability, capable of interconnecting thousands of nodes and supporting thousands of CPUs. Its powerful data processing and computing capabilities are suitable for the following scenarios:

Suitable for analysis oriented applications, such as building enterprise level ODS/EDW, data marts, etc.

Suitable for applications that require processing massive amounts of data, such as data warehouses, business intelligence, etc.

Suitable for offline data analysis and data mining, such as customer behavior analysis, character profiling, behavior prediction modeling, etc.

Features

H3C SeaSQL MPP is a distributed database based on PostgreSQL, which adopts the Shared Networking architecture. The host, operating system, memory, and storage are all self controlled and there is no sharing. The main features of H3C SeaSQL MPP database are as follows:

·Large scale parallel processing architecture.

·Supports both row and column storage. Each table or table partition can be designated by the administrator as storage and compression methods according to application needs.

·Supports multiple compression methods, including Zlib, RLE, etc.

·Support multi-level partition tables, partition supports multiple modes, including range, list, etc.

·Supports indexes such as B-tree, bitmap, and GiST.

·The authentication mechanism supports multiple methods, including LDAP and Kerberos.

·Extension language support: SeaSQL MPP supports the implementation of user-defined functions using various popular languages, including Python, R, Java, Perl, C/C++, etc.

·Geographic Information Processing: By integrating PostGIS and SeaSQL MPP, it supports the storage and analysis of geographic information.

·Built in data mining algorithm library: Through the MADlib (now an Apache incubation project) algorithm library, dozens of common data analysis and mining algorithms can be built into the SeaSQL MPP database, including logistic regression, decision trees, random forests, etc. No need to write any algorithm code, all algorithms can be used through SQL.

·Text retrieval: SeaSQL MPP can support efficient, flexible, and rich full-text retrieval functions. When used in conjunction with MADlib, it can perform parallel text analysis and mining.

·High performance loading, using MPP technology to provide loading performance for Petabyte level data volume.

·Optimization of workflow queries for big data.

·Multi state data storage and execution.

·Advanced machine learning capabilities based on Apache MADlib.

·Supports SQL 92 ANSI/ISO, SQL 99 ANSI/ISO, SQL 2003 ANSI/ISO, SQL 2006 ANSI/ISO standards, supports international interface specifications such as C API, ODBC, JDBC, supports DDL, DML, DCL syntax, supports basic data types, basic integrity constraints, basic table management, search conditions, table joins, subqueries, inserts, modifications, deletions, and transaction control.

Advantages and Characteristics

Cloud deployment

SeaSQL MPP supports the deployment and installation of CloudOS 5.0 cloud platform, and can support cloud deployment. Users can centrally manage and schedule all resources through the CloudOS interface, allowing for flexible management and allocation of resources to improve overall resource utilization.

Expansion without interruption

SeaSQL MPP database can be expanded in two ways according to customer needs: host expansion and instance expansion. During the expansion process, only the system table metadata needs to be updated to complete the expansion, and there is no need to stop the database operation. After the database expansion, the Jump Consistent Hash algorithm is used to greatly reduce the movement of data during the data redistribution process.

Row column hybrid storage

The SeaSQL MPP database provides multiple types of storage models: row storage, column storage, and mixed row column storage, allowing for the development of data storage modes based on business needs.

OLAP function

SeaSQL MPP provides rich OLAP functions including Rollup, cube, window functions, recursive operations, etc., to support complex analysis operations, focusing on decision support for decision-makers and senior management. According to the requirements of analysts, it is possible to quickly and flexibly process complex queries with large amounts of data, so that they can accurately grasp the business situation of the enterprise, understand the needs of the service recipients, and formulate correct plans.

multi-tenancy

The multi tenant feature of SeaSQL MPP database can divide a database into multiple tenants for use, with physical resources of different tenants isolated from each other. The multi tenant feature mainly reflects the following advantages:

Different tenants have different requirements for physical resources, and each resource is isolated and does not interfere with each other to prevent certain tenants from seizing resources from other users during business peaks.

The data between tenants is isolated from each other to improve data security.

When system resources are idle, CPU and memory resources can be flexibly used. When resources are busy, the resources between tenants are limited according to the parameters set at the time, improving the overall resource utilization of the system.

Powerful parallel loading capability

The performance indicators of data import play an important role in the user experience of data warehouses. SeaSQL MPP database can use the resources of all nodes in the entire cluster when loading data, and the loading performance increases linearly with the increase of the number of nodes. In large-scale clusters, the data loading rate can reach 20TB/hour.

Integrate MADLib machine learning library

MADlib is not designed for programmers, but for database development or DBA. It combines the simplicity and ease of SQL with the complexity of data mining algorithms, fully utilizing the advantages and characteristics of both, greatly improving the development efficiency of developers.

For users, MADlib provides functions that can be called in SQL query statements, including not only basic linear algebra operations and statistical functions, but also common, ready-made machine learning or data mining model functions. Users do not need to have a deep understanding of the program implementation details of the algorithm, they only need to be clear about the usage of the functions, greatly improving development efficiency and saving development costs.

Integrated PostGIS Geospatial Data Processing

PostGIS is an extension of the object relational database system PostgreSQL, which converts the PostgreSQL database management system into a spatial database by adding support for spatial data types, spatial indexes, and spatial functions to PostgreSQL.

SeaSQL MPP integrates PostGIS spatial database to fully integrate spatial data and object relational database, achieving a transformation from GIS centric to database centric. In this way, users do not need specialized GIS data engines to process and manipulate spatial data, and applications can easily manipulate spatial data through SQL language.

Transparent encryption

The SeaSQL MPP transparent encryption module implements encryption of the entire database, which is completely imperceptible to the client. Encrypt data when writing data blocks to disk; Decrypt when reading data from the disk. It can ensure that the data stored on the disk is always encrypted, and even if the disk contents are obtained, the plaintext data inside cannot be read. At the same time, the business level is completely unaware of encryption actions and does not require any adaptation or modification for encryption. The encryption method adopts the XTS encryption mode of AES, ensuring the security of data encryption.

data desensitization

Data masking, also known as data bleaching, data depersonalization, or data distortion. Refers to the transformation of sensitive information through desensitization rules to achieve reliable protection of sensitive privacy data. In the case of involving customer security data or some commercially sensitive data, real data shall be transformed and provided for test use without violating the system rules. Personal information such as ID number, mobile phone number, card number, customer number, etc. shall be desensitized.

The SeaSQL MPP database provides a variety of desensitization methods. After defining desensitization rules, users can access desensitized data in the following two ways:

Static desensitization:Removing sensitive information from the database, sensitive data in the database is overwritten and cannot be recovered.

Dynamic desensitization:Block sensitive information from designated users, and other users who have not been desensitized will still access the original data.

FDW Data Federation

FDW is an implementation of the SQL standard SQL/MED (SQL Management of External Data). FDW provides a series of unified public interfaces, allowing extension programs to easily integrate with SeaSQL in core areas such as optimization, execution, scanning, updating, and statistics, enabling direct querying and manipulation of external data sources using SQL statements. For example, FDW for MySQL allows users to directly query, sort, group, filter, Join, and even insert and update data from MySQL databases like they would with local tables.

According to different data sources, the FDW module implemented by SeaSQL includes: postgres_fdw、file_fdw、oracle_fdw、mysql_fdw、json_fdw、redis_fdw、hdfs_dfw, As shown in the following figure:

Roaringbitmap compressed bitmap

RoaringBitMap is an efficient bitmap compression algorithm that can effectively improve the efficiency of bitmap memory usage and solve the problem of sparse bitmap not being suitable for sparse storage. The bitmap bit calculation is very suitable for big data cardinality calculation and is commonly used in calculations such as deduplication, label filtering, and time series analysis. The gpd_roaringbitmap plugin integrates Roaringbitmap functionality into the SeaSQL MPP database, providing native database functions, operators, aggregation, and other support for Roaringbitmap as a data type.

Similar Product Recommend