HOME arrow DATABASE INTERVIEW QUESTIONS arrow MY SQL INTERVIEW QUESTIONS arrow MYSQL INTERVIEW QUESTIONS - SET9



PLACEMENT PAPERS

MYSQL INTERVIEW QUESTIONS - SET9 PDF Print E-mail

 

MySQL - System/Compile Time and Startup Parameter Tuning
We start with the system level things since some of these decisions have to be made very early. In other cases a fast look at this part may suffice because it not that important for the big gains. However, it is always nice to have a feeling about how much one could gain by changing things at this level.

The default OS to use is really important! To get the most use of multiple CPU machines one should use Solaris (because the threads works really nice) or Linux (because the 2.2 kernel has really good SMP support). Also on 32-bit machines Linux has a 2G file size limit by default. Hopefully this will be fixed soon when new filesystems are released (XFS/Reiserfs). If you have a desperate need for files bigger than 2G on Linux-Intel 32 bit, you should get the LFS patch for the ext2 file system.

Because we have not run MySQL in production on that many platforms, we advice you to test your intended platform before choosing it, if possible.

Other tips:

If you have enough RAM, you could remove all swap devices. Some operating systems will use a swap device in some contexts even if you have free memory.
Use the --skip-locking MySQL option to avoid external locking. Note that this will not impact MySQL's functionality as long as you only run one server. Just remember to take down the server (or lock relevant parts) before you run myisamchk. On some system this switch is mandatory because the external locking does not work in any case. The --skip-locking option is on by default when compiling with MIT-pthreads, because flock() isn't fully supported by MIT-pthreads on all platforms. It's also on default for Linux as Linux file locking are not yet safe. The only case when you can't use --skip-locking is if you run multiple MySQL servers (not clients) on the same data, or run myisamchk on the table without first flushing and locking the mysqld server tables first. You can still use LOCK TABLES/UNLOCK TABLES even if you are using --skip-locking 12.2.1 How Compiling and Linking Affects the Speed of MySQL
Most of the following tests are done on Linux with the MySQL benchmarks, but they should give some indication for other operating systems and workloads.

You get the fastest executable when you link with -static.

On Linux, you will get the fastest code when compiling with pgcc and -O6. To compile `sql_yacc.cc' with these options, you need about 200M memory because gcc/pgcc needs a lot of memory to make all functions inline. You should also set CXX=gcc when configuring MySQL to avoid inclusion of the libstdc++ library (it is not needed). Note that with some versions of pgcc, the resulting code will only run on true Pentium processors, even if you use the compiler option that you want the resulting code to be working on all x586 type processors (like AMD).

By just using a better compiler and/or better compiler options you can get a 10-30 % speed increase in your application. This is particularly important if you compile the SQL server yourself!

We have tested both the Cygnus CodeFusion and Fujitsu compilers, but when we tested them, neither was sufficiently bug free to allow MySQL to be compiled with optimizations on.

When you compile MySQL you should only include support for the character sets that you are going to use. (Option --with-charset=xxx). The standard MySQL binary distributions are compiled with support for all character sets.

Here is a list of some measurements that we have done:

If you use pgcc and compile everything with -O6, the mysqld server is 1% faster than with gcc 2.95.2. If you link dynamically (without -static), the result is 13% slower on Linux. Note that you still can use a dynamic linked MySQL library. It is only the server that is critical for performance.
If you connect using TCP/IP rather than Unix sockets, the result is 7.5% slower on the same computer. (If you are connection to localhost, MySQL will, by default, use sockets).
If you compile with --with-debug=full, then you will loose 20 % for most queries, but some queries may take substantially longer (The MySQL benchmarks ran 35 % slower) If you use --with-debug, then you will only loose 15 %. On a Sun SPARCstation 20, SunPro C++ 4.2 is 5 % faster than gcc 2.95.2.
Compiling with gcc 2.95.2 for ultrasparc with the option -mcpu=v8 -Wa,-xarch=v8plusa gives 4 % more performance. On Solaris 2.5.1, MIT-pthreads is 8-12% slower than Solaris native threads on a single processor. With more load/CPUs the difference should get bigger.
Running with --log-bin makes [MySQL 1 % slower.
Compiling without frame pointers -fomit-frame-pointer with gcc makes MySQL 1 % faster.
The MySQL-Linux distribution provided by MySQL AB used to be compiled with pgcc, but we had to go back to regular gcc because of a bug in pgcc that would generate the code that does not run on AMD. We will continue using gcc until that bug is resolved. In the meantime, if you have a non-AMD machine, you can get a faster binary by compiling with pgcc. The standard MySqL Linux binary is linked statically to get it faster and more portable.

MySQL - Disk Issues
As mentioned before, disks seeks are a big performance bottleneck. This problems gets more and more apparent when the data starts to grow so large that effective caching becomes impossible. For large databases, where you access data more or less randomly, you can be sure that you will need at least one disk seek to read and a couple of disk seeks to write things. To minimize this problem, use disks with low seek times.
Increase the number of available disk spindles (and thereby reduce the seek overhead) by either symlink files to different disks or striping the disks.
Using symbolic links
This means that you symlink the index and/or data file(s) from the normal data directory to another disk (that may also be striped). This makes both the seek and read times better (if the disks are not used for other things).

Striping
Striping means that you have many disks and put the first block on the first disk, the second block on the second disk, and the Nth on the (N mod number_of_disks) disk, and so on. This means if your normal data size is less than the stripe size (or perfectly aligned) you will get much better performance. Note that striping is very dependent on the OS and stripe-size. So benchmark your application with different stripe-sizes.

Note that the speed
difference for striping is very dependent on the parameters. Depending on how you set the striping parameters and number of disks you may get a difference in orders of magnitude. Note that you have to choose to optimize for random or sequential access.
For reliability you may want to use RAID 0+1 (striping + mirroring), but in this case you will need 2*N drives to hold N drives of data. This is probably the best option if you have the money for it! You may, however, also have to invest in some volume-management software to handle it efficiently.
A good option is to have semi-important data (that can be regenerated) on RAID 0 disk while storing really important data (like host information and logs) on a RAID 0+1 or RAID N disk. RAID N can be a problem if you have many writes because of the time to update the parity bits.
You may also set the parameters for the file system that the database uses. One easy change is to mount the file system with the noatime option. That makes it skip the updating of the last access time in the inode and by this will avoid some disk seeks.
On Linux, you can get much more performance (up to 100 % under load is not uncommon) by using hdpram to configure your disk's interface! The following should be quite good hdparm options for MySQL (and probably many other applications): hdparm -m 16 -d 1

Note that the performance/reliability when using the above depends on your hardware, so we strongly suggest that you test your system throughly after using hdparm! Please consult the hdparm man page for more information! If hdparm is not used wisely, filesystem corruption may result. Backup everything before experimenting!
On many operating systems you can mount the disks with the 'async' flag to set the file system to be updated asynchronously. If your computer is reasonable stable, this should give you more performance without sacrificing too much reliability. (This flag is on by default on Linux.)
If you don't need to know when a file was last accessed (which is not really useful on a database server), you can mount your file systems with the noatime flag.

 

MySQL - Using Symbolic Links for Databases and Tables
You can move tables and databases from the database directory to other locations and replace them with symbolic links to the new locations. You might want to do this, for example, to move a database to a file system with more free space.

If MySQL notices that a table is symbolically linked, it will resolve the symlink and use the table it points to instead. This works on all systems that support the realpath() call (at least Linux and Solaris support realpath())! On systems that don't support realpath(), you should not access the table through the real path and through the symlink at the same time! If you do, the table will be inconsistent after any update.

MySQL doesn't that you link one directory to multiple databases. Replacing a database directory with a symbolic link will work fine as long as you don't make a symbolic link between databases. Suppose you have a database db1 under the MySQL data directory, and then make a symlink db2 that points to db1:

shell> cd /path/to/datadir
shell> ln -s db1 db2

Now, for any table tbl_a in db1, there also appears to be a table tbl_a in db2. If one thread updates db1.tbl_a and another thread updates db2.tbl_a, there will be problems.

If you really need this, you must change the following code in `mysys/mf_format.c':

if (flag & 32 || (!lstat(to,&stat_buff) && S_ISLNK(stat_buff.st_mode)))

to

if (1)

On Windows you can use internal symbolic links to directories by compiling MySQL with -DUSE_SYMDIR. This allows you to put different databases on different disks.

MySQL - Tuning Server Parameters
You can get the default buffer sizes used by the mysqld server with this command:

shell> mysqld --help

This command produces a list of all mysqld options and configurable variables. The output includes the default values and looks something like this:

Possible variables for option --set-variable (-O) are:
back_log current value: 5
bdb_cache_size current value: 1048540
binlog_cache_size current_value: 32768
connect_timeout current value: 5
delayed_insert_timeout current value: 300
delayed_insert_limit current value: 100
delayed_queue_size current value: 1000
flush_time current value: 0
interactive_timeout current value: 28800
join_buffer_size current value: 131072
key_buffer_size current value: 1048540
lower_case_table_names current value: 0
long_query_time current value: 10
max_allowed_packet current value: 1048576
max_binlog_cache_size current_value: 4294967295
max_connections current value: 100
max_connect_errors current value: 10
max_delayed_threads current value: 20
max_heap_table_size current value: 16777216
max_join_size current value: 4294967295
max_sort_length current value: 1024
max_tmp_tables current value: 32
max_write_lock_count current value: 4294967295
myisam_sort_buffer_size current value: 8388608
net_buffer_length current value: 16384
net_retry_count current value: 10
net_read_timeout current value: 30
net_write_timeout current value: 60
query_buffer_size current value: 0
record_buffer current value: 131072
slow_launch_time current value: 2
sort_buffer current value: 2097116
table_cache current value: 64
thread_concurrency current value: 10
tmp_table_size current value: 1048576
thread_stack current value: 131072
wait_timeout current value: 28800

If there is a mysqld server currently running, you can see what values it actually is using for the variables by executing this command:

shell> mysqladmin variables

You can find a full description for all variables in the SHOW VARIABLES section in this manual.

You can also see some statistics from a running server by issuing the command SHOW STATUS.

MySQL uses algorithms that are very scalable, so you can usually run with very little memory. If you, however, give MySQL more memory, you will normally also get better performance.

When tuning a MySQL server, the two most important variables to use are key_buffer_size and table_cache. You should first feel confident that you have these right before trying to change any of the other variables.

If you have much memory (>=256M) and many tables and want maximum performance with a moderate number of clients, you should use something like this:

shell> safe_mysqld -O key_buffer=64M -O table_cache=256 \
-O sort_buffer=4M -O record_buffer=1M &

If you have only 128M and only a few tables, but you still do a lot of sorting, you can use something like:

shell> safe_mysqld -O key_buffer=16M -O sort_buffer=1M

If you have little memory and lots of connections, use something like this:

shell> safe_mysqld -O key_buffer=512k -O sort_buffer=100k \
-O record_buffer=100k &

or even:

shell> safe_mysqld -O key_buffer=512k -O sort_buffer=16k \
-O table_cache=32 -O record_buffer=8k -O net_buffer=1K &

When you have installed MySQL, the `support-files' directory will contain some different my.cnf example files, `my-huge.cnf', `my-large.cnf', `my-medium.cnf', and `my-small.cnf', you can use as a base to optimize your system.
If there are very many connections, ``swapping problems'' may occur unless mysqld has been configured to use very little memory for each connection. mysqld performs better if you have enough memory for all connections, of course.
Note that if you change an option to mysqld, it remains in effect only for that instance of the server.
To see the effects of a parameter change, do something like this:

shell> mysqld -O key_buffer=32m --help

Make sure that the --help option is last; otherwise, the effect of any options listed after it on the command line will not be reflected in the output.

How MySQL Opens and Closes Tables ?
table_cache, max_connections, and max_tmp_tables affect the maximum number of files the server keeps open. If you increase one or both of these values, you may run up against a limit imposed by your operating system on the per-process number of open file descriptors. However, you can increase the limit on many systems. Consult your OS documentation to find out how to do this, because the method for changing the limit varies widely from system to system.

table_cache is related to max_connections. For example, for 200 concurrent running connections, you should have a table cache of at least 200 * n, where n is the maximum number of tables in a join.

The cache of open tables can grow to a maximum of table_cache (default 64; this can be changed with the -O table_cache=# option to mysqld). A table is never closed, except when the cache is full and another thread tries to open a table or if you use mysqladmin refresh or mysqladmin flush-tables.

When the table cache fills up, the server uses the following procedure to locate a cache entry to use:

Tables that are not currently in use are released, in least-recently-used order.
If the cache is full and no tables can be released, but a new table needs to be opened, the cache is temporarily extended as necessary.
If the cache is in a temporarily-extended state and a table goes from in-use to not-in-use state, the table is closed and released from the cache.
A table is opened for each concurrent access. This means that if you have two threads accessing the same table or access the table twice in the same query (with AS) the table needs to be opened twice. The first open of any table takes two file descriptors; each additional use of the table takes only one file descriptor. The extra descriptor for the first open is used for the index file; this descriptor is shared among all threads.

You can check if your table cache is too small by checking the mysqld variable opened_tables. If this is quite big, even if you haven't done a lot of FLUSH TABLES, you should increase your table cache.

MySQL - Drawbacks to Creating Large Numbers of Tables in the Same Database
If you have many files in a directory, open, close, and create operations will be slow. If you execute SELECT statements on many different tables, there will be a little overhead when the table cache is full, because for every table that has to be opened, another must be closed. You can reduce this overhead by making the table cache larger.

MySQL - Why So Many Open tables?
When you run mysqladmin status, you'll see something like this:

Uptime: 426 Running threads: 1 Questions: 11082 Reloads: 1 Open tables: 12

This can be somewhat perplexing if you only have 6 tables.

MySQL is multithreaded, so it may have many queries on the same table simultaneously. To minimize the problem with two threads having different states on the same file, the table is opened independently by each concurrent thread. This takes some memory and one extra file descriptor for the data file. The index file descriptor is shared between all threads.

How MySQL Uses Memory ?
The list below indicates some of the ways that the mysqld server uses memory. Where applicable, the name of the server variable relevant to the memory use is given:

The key buffer (variable key_buffer_size) is shared by all threads; Other buffers used by the server are allocated as needed.

Each connection uses some thread-specific space: A stack (default 64K, variable thread_stack), a connection buffer (variable net_buffer_length), and a result buffer (variable net_buffer_length). The connection buffer and result buffer are dynamically enlarged up to max_allowed_packet when needed. When a query is running, a copy of the current query string is also allocated.
All threads share the same base memory.
Only the compressed ISAM / MyISAM tables are memory mapped. This is because the 32-bit memory space of 4GB is not large enough for most big tables. When systems with a 64-bit address space become more common we may add general support for memory mapping.
Each request doing a sequential scan over a table allocates a read buffer (variable record_buffer).
All joins are done in one pass, and most joins can be done without even using a temporary table. Most temporary tables are memory-based (HEAP) tables. Temporary tables with a big record length (calculated as the sum of all column lengths) or that contain BLOB columns are stored on disk. One problem in MySQL versions before Version 3.23.2 is that if a HEAP table exceeds the size of tmp_table_size, you get the error The table tbl_name is full. In newer versions this is handled by automatically changing the in-memory (HEAP) table to a disk-based (MyISAM) table as necessary. To work around this problem, you can increase the temporary table size by setting the tmp_table_size option to mysqld, or by setting the SQL option SQL_BIG_TABLES in the client program.

In MySQL Version 3.20, the maximum size of the temporary table was record_buffer*16, so if you are using this version, you have to increase the value of record_buffer. You can also start mysqld with the --big-tables option to always store temporary tables on disk. However, this will affect the speed of many complicated queries.
Most requests doing a sort allocates a sort buffer and 0-2 temporary files depending on the result set size.

Almost all parsing and calculating is done in a local memory store. No memory overhead is needed for small items and the normal slow memory allocation and freeing is avoided. Memory is allocated only for unexpectedly large strings (this is done with malloc() and free()).
Each index file is opened once and the data file is opened once for each concurrently running thread. For each concurrent thread, a table structure, column structures for each column, and a buffer of size 3 * n is allocated (where n is the maximum row length, not counting BLOB columns). A BLOB uses 5 to 8 bytes plus the length of the BLOB data. The ISAM/MyISAM table handlers will use one extra row buffer for internal usage.
For each table having BLOB columns, a buffer is enlarged dynamically to read in larger BLOB values. If you scan a table, a buffer as large as the largest BLOB value is allocated.
Table handlers for all in-use tables are saved in a cache and managed as a FIFO. Normally the cache has 64 entries. If a table has been used by two running threads at the same time, the cache contains two entries for the table.

A mysqladmin flush-tables command closes all tables that are not in use and marks all in-use tables to be closed when the currently executing thread finishes. This will effectively free most in-use memory. ps and other system status programs may report that mysqld uses a lot of memory. This may be caused by thread-stacks on different memory addresses. For example, the Solaris version of ps counts the unused memory between stacks as used memory. You can verify this by checking available swap with swap -s. We have tested mysqld with commercial memory-leakage detectors, so there should be no memory leaks.

 

 

 

 
< Prev   Next >

Polls

MOST USEFUL SECTION THIS SITE