Software informational articles

40/sec to 500/sec - software



Surprised, by the title? well, this is a tour of how we cracked the scalability jinx from conduct a meagre 40 report per back to 500 report per second. Beware, most of the evils we faced were above-board forward, so skilled ancestors might find this superfluous.

* 1. 0 Where were we?

1. 1 Remembrance hits the sky
1. 2 Low dealing out rate
1. 3 Data loss :-(
1. 4 Mysql pulls us down
1. 5 Slow Web Client

* 2. 0 Road to Nirvana

2. 1 Scheming memory!
2. 2 Reorganization giving out rate
2. 3 What data loss uh-uh?
2. 4 Tuning SQL Queries
2. 5 Tuning file schema
2. 5 Mysql helps us forge ahead!
2. 6 Faster. . . faster Web Client

* 3. 0 Bed line

Where were we?

Initially we had a arrangement which could scale only upto 40 account /sec. I could even call to mind the discussion, about "what ought to be the ideal rate of records? ". At last we certain that 40/sec was the ideal rate for a free firewall. So when we have to go out, we atleast considered necessary to assist 3 firewalls. Hence we categorical that 120/sec would be the ideal rate. Based on the data from our competitor(s) we came to the deduction that, they could assist about 240/sec. We accepted wisdom it was ok! as it was our first release. As all the competitors talked about the come to of firewalls he supported but not on the rate.

Memory hits the sky

Our remembrance was continually drumming the sky even at 512MB! (OutOfMemory exception) We blamed cewolf(s) inmemory caching of the generated images. But we could not avoid for long! No affair whether we allied the client or not we used to hit the sky in a connect of days max 3-4 days flat! Interestingly,this was reproducible when we sent data at very high rates(then), of about 50/sec. You guessed it right, an bottomless bulwark which grows until it hits the roof.

Low dealing out rate

We were giving out report at the rate of 40/sec. We were using bulk fill in of dataobject(s). But it did not give the predictable speed! As of this we on track to hoard data in recall consequential in road sign memory!

Data Loss :-(

At very high speeds we used to miss many a packet(s). We seemed to have barely data loss, but that resulted in a recollection hog. On some amendment to limit the cushion size we on track having a steady data loss of about 20% at very high rates.

Mysql pulls us down

We were facing a tough time when we imported a log file of about 140MB. Mysql happening to hog,the apparatus happening crawling and every now and then it even closed responding. Above all, we ongoing in receipt of deadlock(s) and transaction timeout(s). Which finally bargain the openness of the system.

Slow Web Client

Here again we blamed the amount of graphs we showed in a page as the bottleneck, ignoring the fact that there were many other factors that were pulling the coordination down. The pages used to take 30 seconds to load for a page with 6-8 graphs and tables after 4 days at Internet Data Center.

Road To Nirvana

Controlling Memory!

We tried to put a limit on the barrier size of 10,000, but it did not last for long. The major flaw in the aim was that we implicit that the cushion of about 10000 would suffice, i. e we would be deal with report beforehand the cushion of 10,1000 reaches. Inline with the attitude "Something can go wrong it will go wrong!" it went wrong. We in progress loosing data. Subsesquently we categorical to go with a flat file based caching, in which the data was dumped into the flat file and would be burdened into the list using "load data infile". This was many times closer than an bulk add via catalog driver. you might also want to examine some doable optimizations with load data infile. This fixed our badly behaved of escalating cushion size of the raw records.

The back up challenge we faced was the amplify of cewolf(s) in recall caching mechanism. By defaulting it used "TransientSessionStorage" which caches the image items in memory, there seemed to be some catch in cleaning up the objects, even after the rerferences were lost! So we wrote a small "FileStorage" implementation which store the image matter in the local file. And would be served as and when the ask for comes in. Moreover, we also implmentated a clearout device to concentrated effort stale images( metaphors older than 10mins).

Another attention-grabbing air we found here was that the Nonsense aerial had buck priority so the stuff bent for each account , were by a hair's breadth cleaned up. Here is a a small amount math to describe the consequence of the problem. Each time we catch a log best ever we fashioned ~20 objects(hashmap,tokenized strings etc) so at the rate of 500/sec for 1 second, the amount of items was 10,000(20*500*1). Due to the heavy giving out Gobbledygook antenna never had a attempt to attack the objects. So all we had to do was a minor tweak, we just assigned "null" to the article references. Voila! the nonsense radio dish was never distressed I guess ;-)

Streamlining giving out rate

The dispensation rate was at a meagre 40/sec that means that we could by a hair's breadth hold out even a small eruption of log records! The remembrance charge gave us some solace,but the authentic badly behaved was with the appliance of the alert filters over the records. We had about 20 properties for each record, we used to explore for all the properties. We misused the implementation to match for those properties we had criteria for! Moreover, we also had a recall leak in the alert filter processing. We maintained a queue which grew forever. So we had to be adamant a flat file article dumping to avoid re-parsing of account to form objects! Moreover, we used to do the act of penetrating for a match for each of the belongings even when we had no alert criteria configured.

What data loss uh-uh?

Once we fixed the reminiscence issues in in receipt of data i. e dumping into flat file, we never lost data! In addendum to that we had to cut off a connect of not needed indexes in the raw table to avoid the overhead while dumping data. We hadd indexes for columns which could have a greatest extent of 3 doable values. Which essentially made the add slower and was not useful.

Tuning SQL Queries

Your queries are your keys to performance. Once you start nailing the issues, you will see that you might even have to de-normalize the tables. We did it! Here is some of the key learnings:

* Use "Analyze table" to classify how the mysql query works. This will give you insight about why the query is slow, i. e whether it is using the adjust indexes, whether it is using a table level scan etc.

* Never cross out rows when you deal with huge data in the order of 50,000 proceedings in a definite table. At all times try to do a "drop table" as much as possible. If it is not possible, brighten up your schema, that is your only way out!

* Avoid redundant join(s), don't be frightened to de-normalize (i. e duplicate the article values) Avoid join(s) as much as possible, they tend to pull your query down. One concealed benefit is the fact that they be in the way simplicity in your queries.

* If you are big business with bulk data, all the time use "load data infile" there are two options here, local and remote. Use local if the mysql and the attention are in the same automaton or else use remote.

* Try to split your composite queries into two or three simpler queries. The recompense in this advance are that the mysql supply is not hogged up for the full process. Tend to use impermanent tables. As a replacement for of using a lone query which spans athwart 5-6 tables.

* When you deal with huge total of data, i. e you want to proces say 50,000 proceedings or more in a definite query try using limit to batch administer the records. This will help you scale the classification to new heights

* All the time use minor transaction(s) in its place of large ones i. e across diagonally "n" tables. This locks up the mysql resources, which might cause sluggishness of the coordination even for clear-cut queries

* Use join(s) on columns with indexes or distant keys

* Make certain that the the queries from the user border have criteria or limit.

* Also make certain that the criteria article is indexed

* Do not have the numeric value in sql criteria contained by quotes, as mysql does a type cast

* use impermanent tables as much as possible, and drop it. . .

* Enclosure of select/delete is a alter ego table lock. . . be aware. . .

* Take care that you do not pain the mysql file with the frequency of your updates to the database. We had a average case we used to dump to the file after every 300 records. So when we ongoing hard for 500/sec we in progress bearing in mind that the mysql was plainly dragging us down. That is when we realized that the typicall at the rate of 500/sec there is an "load data infile" application every back to the mysql database. So we had to adjust to dump the account after 3 follow-up instead than 300 records.

Tuning file schema

When you deal with huge sum of data, all the time make sure that you partition your data. That is your road to scalability. A definite table with say 10 lakhs can never scale. When you be going to to accomplish queries for reports. All the time have two levels of tables, raw tables one for the concrete data and a different set for the bang tables( the tables which the user interfaces query on!) Constantly make certain that the data on your account tables never grows afar a limit. Incase you are arrangement to use Oracle, you can try out the partitioning based on criteria. But sorry to say mysql does not assist that. So we will have to do that. Assert a meta table in which you have the heading in rank i. e which table to look for, for a set of given criteria as a rule time.

* We had to walk all through our catalog representation and we added to add some indexes, erase some and even duplicated column(s) to cut off costly join(s).

* Going ahead we realized that having the raw tables as InnoDB was in reality a overhead to the system, so we misused it to MyISAM

* We also went to the coverage of falling the add up to of rows in static tables caught up in joins

* NULL in file tables seems to cause some act hit, so avoid them

* Don't have indexes for columns which has acceptable principles of 2-3

* Cross check the need for each index in your table, they are costly. If the tables are of InnoDB then alter ego check their need. As InnoDB tables seem to take about 10-15 times the size of the MyISAM tables.

* Use MyISAM each time there is a adulthood of , any one of (select or insert) queries. If the enclosure and choice are going to be more then it is advance to have it as an InnoDB

Mysql helps us forge ahead!

Tune your mysql ma?tre d'h?tel ONLY after you fine tune your queries/schemas and your code. Only then you can see a perceivable convalescence in performance. Here are some of the parameters that comes in handy:

* Use the bulwark pool size which will allow your queries to complete earlier --innodb_buffer_pool_size=64M for InnoDB and use --key-bufer-size=32M for MyISAM

* Even clear-cut queries in progress captivating more time than expected. We were in point of fact puzzled! We realized that mysql seems to load the index of any table it starts inserting on. So what typically happened was, any austere query to a table with 5-10 rows took about 1-2 secs. On additional examination we found that just ahead of the down-to-earth query , "load data infile" happened. This left when we altered the raw tables to MyISAM type, since the bulwark size for innodb and MyISAM are two another configurations.

for more configurable parameters see here.

Tip: start your mysql to start with the subsequent opportunity --log-error this will allow error logging

Faster. . . faster Web Client

The user crossing point is the key to any product, exceptionally the perceived speed of the page is more important! Here is a list of solutions and learnings that might come in handy:

* If your data is not going to alteration for say 3-5 minutes, it is change for the better to cache your client side pages

* Tend to use Iframe(s)for inner graphs etc. they give a perceived castle to your pages. Beat still use the javascript based contented loading mechanism. This is a touch you might want to do when you have say 3+ graphs in the same page.

* Internet traveler displays the whole page only when all the filling are conventional from the server. So it is advisable to use iframes or javascript for comfortable loading.

* Never use multiple/duplicate entries of the CSS file in the html page. Internet surveyor tends to load each CSS file as a break free entry and applies on the absolute page!

Bottomline Your queries and plan make the approach slower! Fix them first and then blame the database!

See Also

* High Accomplishment Mysql

* Query Performance

* Account for Query

* Optimizing Queries

* InnoDB Tuning

* Tuning Mysql

Categories: Firewall Analyzer | Act Tips This page was last custom-made 18:00, 31 Dignified 2005.


Developed by:
home | site map © 2018