Next Previous Contents

9. PostgreSQL Supports Extremely Large Databases greater than 200 Gig

PostgreSQL is already used by many companies supporting large databases. The following techniques are suggested :

9.1 CPU types - 32-bit or 64-bit

Performance of 32-bit cpu machines will decline rapidly when the database size exceeds 5 GigaByte. You can run 30 gig database on 32-bit cpu but the performance will be degraded. Machines with 32-bit cpu imposes a limitation of 2 GB on RAM, 2 GB on file system sizes and other limitations on the operating system. Use the special filesystems for linux made by SGI, IBM or HP or ext3-fs to support file-sizes greater than 2 GB on 32-bit linux machines.

For extremely large databases, it is strongly advised to use 64-bit machines like Digital Alpha cpu, Sun Ultra-sparc 64-bit cpu, Silicon graphics 64-bit cpu, Intel Merced IA-64 cpu, HPUX 64bit machines or IBM 64-bit machines. Compile PostgreSQL under 64-bit cpu and it can support huge databases and large queries. Performance of PostgreSQL for queries on large tables and databases will be several times faster than PostgreSQL on 32-bit cpu machines. Advantage of 64-bit machines are that you get very large memory addressing space and the operating system can support very large file-systems, provide better performance with large databases, support much larger memory (RAM), have more capabilities etc..

9.2 Multiple CPUs

For large databases it is recommended that you use SMP boxes which have 4, 16 or 32 CPUs. Alternatively, you can use 4 or 5 single CPU boxes and you can partition the database into 4 or 5 seperate databases and each database running on a seperate box. Each CPU will be connected with fast NIC (100MBits) ethernet card. For example - if you have 200 tables in a database, you can distribute 200 tables to 4 database each having 50 tables. In this way, you are distributing the load evenly among 4 seperate machines. This is a cheaper alternative to 4-way CPU box. You would use 'Queries across multiple databases', NFS mounts in LAN, 'CREATE VIEW' for read-only tables to accomplish this task. And each CPU "can see" all the databases i.e all the 200 tables. In future PostgreSQL may provide support for 'Queries across multiple databases' (already in the TODO list), which may appear in upcoming version 7.1. For example, queries across multiple databases using aliases a, b for table names can be like -


select a.col1, a.col2, b.col4, b.col7 
from
        database1.my_tablea a, database2.my_tableb b
where
        a.col1 = b.col3 and
        a.col4 = b.col9;

update my_tablea 
set
        col1 =  b.col2
from 
        database1.my_tablea a, database2.my_tableb b
where
        a.col4 = b.col9;

9.3 Replication Server

Replication server for large enterprises/businesses is available at http://www.erserver.com and from http://www.pgsql.com. The support is sold ($$$$s) commercially by PostgreSQL Inc. You use replication server to provide redundancy and high availability. Replication server is a complex, sophisticated product.


Next Previous Contents