chronicle of a curious mind: 2009

Monday, May 18, 2009

software quality-Joel Test

http://www.joelonsoftware.com/articles/fog0000000043.html

Wednesday, May 13, 2009

Yet abother RESTful API that's not RESTful at all

I just run across another claimed RESTful API published by xiaonei.com. However, IMHO, it is not RESTful at all. That must make Fielding frustrated again. It is actually POX over HTTP. Obviously REST has been used as brand to mean buzz word compatible thing, that's what Fielding don't wanna see. Please see my previous post about what does REST really means and how.

what does RESTful web service really mean and how?

This summary is not available. Please click here to view the post.

Friday, May 8, 2009

SOAP is boring, we need more REST

Since SOA became a buzz word, web service has been touted by vendors as the holy grail for EAI or even more, restructure of the existing IT architecture to get a SOA brand. However, if SOA is not driven by real business needs, it must be doomed. So if we have have done extensive cost benefits alaysis of SOA and concluded that we will have one, how can we do that? Basically SOA is a top down approach, because implementation of SOA need a holistic view of the entire IT infrastructure. Maybe there are incremental ways for SOA that I don't know. As a top down approach, it is more about governance rather than devise of a cutting-ege fancy technology to implement it. However, technological aspect somehow determines the adoption rate of SOA.

For now web service is the mainstream implementing technology for SOA because big vendors are driving behind it. But recently there is a new buzz word REST. REST has been a hot topic in many technology conference, like http://enterprisewebconf.com/sessions.html, the state of REST vs. SOAP, intro of REST, qcon presentation about combining REST & WS-*. The most interesting one I have watched is by Steve Vinoski, who was been in trenches for decades. When a CORBA guy is talking about distributed system, we shoule be listening. So what is this guy really talking about? Well, RPC is fundermantally flawed, REST is a better alternative way to go. That's what he is advocating. However, some guys don't buy it. Hot debates happened here, here and etc. One of points I think make sense is that it depends on what kind of control you have on the system to be built. If you have total control of all of the end points of the system, RPC can be used for optimized performance; on the other hand, if some of the end points are outside of your control, REST is a better alternative. So, in this reasoning, SOAP just doesn't fit into the space. Here is an extensive comparision between WS-* and REST.

UPDATE: I just run into this post about what Gartner coined as WOA(Web Oriented Architecture). Actually WOA is just an attempt of Gartner to make a new brand of its own from REST. Nothing new. On the other hand, Gartner proposed WOA as constraints of WS-* stack. How this can be done in the real world? I suspect vendors have motives to do it.

Monday, April 27, 2009

A note on software architecture style classification

Architecture style of software system has evolved for decades. We can classify these styles as below.

1. No Architecture
no unified principle,thus no architecture
a integration task needed to plug into the whole enterprise after each
application developed
applications interact in a point-to-point way
each application has its own data store
interface bloating with O(n*n)
also referred as "post integration"
drawbacks: lack of semantic consistency
uncontrolled data replication
result: tight coupling, ripple effect

2. The Integrated Database Architecture
unified data model with clearly defined semantic
applications interact through a single data store
a single data store also a giant "global variable"
still result in tight coupling and ripple effect

3. The Distributed Object Architecture
OO Model ensures consistent semantic
still result in tight coupling, vendor lock-in
examples: EJB, DCOM, CORBA

4. Message Broker(Hub and Spoke)
Star-like topology
applications interact through the central broker
add a intermediary between applications, thus application can be removed or
replaced without effect on others
drawbacks: single point of failure
limited scalability
example: Web Methods

5. The Message Bus Architecture
Flexibility is one of the most crucial qualities of modern organization
Imagine main board bus architecture in computer
return to Integrated Database Architecture but difference remains
applications interact by sending message conforming to a message schema
drawbacks: proprietary messaging protocol, vendor lock-in
security risk including network flooding
message format adaption
example: TIBCO Rendezvous

6. Hybrid Architecture
virtual group
each group contains nodes acting as broker and bus
example: Microsoft BizTalk

6. Service Oriented Architecture
service everywhere
each application exports its own function to service which can be consumed by
other application
also each application can import services provided by other application in
implementing its own function
Put it simple, each application can be both service provider and service consumer.

Conclusion
1. No silver bullet, no one-size-fits-all solution.
2. No perfect architecture, only appropriate architecture
3. Big upfront design is less feasible than incremental iterative design

Friday, April 24, 2009

some new stuff worth a look

I came across the InfoWorld's 2008 best open source software awards yesterday. Today InformationWeek's Top50 startups list pops up. Some of them definitely worth a look.

1. Git: a distributed version control system that has been used for Linux kernel, fedora and other important open source projects with geographical distribution characteristic.

2. Intel Threaded Building Blocks: an open source cross-OS x86 c++ template library for parallel programming. The essence of this library is a work stealing scheduler. There is an equivalent API in java called fork-join framework that is under development.

3. Alfreso: open source Enterprise Content Management alternative for MS Sharepoint. Most java projects use Confulence Wiki for similar purpose, but ECM solution provides more rich feature set.

4. Hyperic HQ: comprehensive open source application and system monitoring solution

5. Pentaho: open source Business Intelligence Suite originated from another comprehensive machine learning algorithm package Weka. Note: I have tried Weka for web page classification. It is more lightweight and developer-friendly than other open source alternatives such as RapidMiner.

6. Vyatta: open source router, firewall & VPN solution/claimed Cisco alternative. Ambitious! Here are some intro webcasts. And here is a comprehensive review. Another similar but more academic project is XORP. Ops! It seems Vyatta was really derived from XORP. Anyway, we can consider to use it as a replacement of Cisco low-end products. More importantly, students can download it and build a virtual network lab with VMware-like virtual machine software. Thanks for the hard work from these guys!

7. Metasploit Framework: open source penetration toolkit that can be used to hammer application for finding potential security vulnerabilities. Also It can be used for malicious attack.

8. Splunk: open source security log analysis framework that can analyze logs from various sources to find out security threats.

9. Amanda: maybe mostly used open source backup solution.

10. Abiquo: open source cloud computing solution provider, ambitious too!

11. Eucalyptu: yet another open source cloud computing solution, but more academic.

12: openqrm: open source data center management software, not touted as cloud stuff yet, but it can be.

I will elaborate more details when I try any of them.

Wednesday, April 22, 2009

architecture principles notes

When I watched a presentation by Ebay architect about Ebay architecture principles, I was thinking about how could we figure out what architecture principles we could use in my specific project cases. After all, architecture principles vary from company to company and from project to project. So what does it derive from? After reading some resources, here is my notes.

1. what?

Before we go further, we'd better make clear what the architecture principles are. Here is a definition from TOGAF's enterprise architecture framework:

Architecture principles are a subset of IT principles that relate to architecture work. They reflect a level of consensus across the enterprise, and embody the spirit and thinking of the enterprise architecture.
......
Architecture principles define the underlying general rules and guidelines for the use and deployment of all IT resources and assets across the enterprise. They reflect a level of consensus among the various elements of the enterprise, and form the basis for making future IT decisions.

Each architecture principle should be clearly related back to the business objectives and key architecture drivers.

It seems way too dogmatic. Here are the guts:

They are IT principles.
They are general guidelines and rules of utilizing IT resources.
They should be well aligned with business objectives.

Here is the components an architecture principle usually contains:

Name: representative name with clear meaning
Statement: description of unambiguous fundamental rule
Rational: highlight business benefits, point out the relations to business principles and relations to other architecture principles, and how to weight them in context
Implication: requirements from both IT and business to carrying out the principle in terms of resources, cost, activities and cost. It's about impact and consequence.

Here is an architecture principles example from Example Set of Architecture Principles from TOGAF's enterprise architecture framework. Another example is NIH enterprise architecture. Maybe this example is more technology oriented.

2. how?

According to the above interpretation of what, we could only derive these architecture principles from business objectives. Here is the method of running a workshop to draw up them. The key points are:

Identify Strategic Objectives
Record Strategic Objectives
Identify Architecture Principles
Explain Architecture Principles
Prioritize Architecture Principles
Show Prioritization Results

reference:

1. http://it.toolbox.com/blogs/enterprise-solutions/running-an-architecture-principles-workshop-12581
2. http://it.toolbox.com/blogs/enterprise-solutions/sample-architecture-principles-workshop-agenda-12598
3. http://www.opengroup.org/architecture/togaf9-doc/arch/
4. http://www.bredemeyer.com/HotSpot/20040428EASoapBox.htm
5. http://enterprisearchitecture.nih.gov/ArchLib/Guide/EnterprisePrinciples.htm
6. http://enterprisearchitecture.nih.gov/About/Approach/Framework.htm
8. http://blogs.msdn.com/architectsrule/archive/2008/04/22/reference-architecture-principles.aspx

handy system administration and monitoring tools

Just a memo:

maybe the most extensive list on the net about system administration:
http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html

Among the list, here are those I have used:

Ntop/Nmon: network traffic data collection

Currently the most popular and also oldest network monitoring tools might be Nagios(network and server monitoring) and MRTG(mainly network traffic). Another perl written one cfengine is getting popular for its powerful rule based management script execution system. Rule language is not new. It has been widely used in business rules engine like ILOG and Drool. It also shines when it is used for system administration. I will give a try if I have a chance.

There are some new open source tools worth a look:
1. OpenNMS: java
3. Hyperic: java
3. Zenoss: python

And it seems old boy is losing favor.

operation dimension of system architecture

In terms of software architecture, there are usually various stakeholders involved in a specific system architecture. Each of them might has different architectural requirements. Product department often submits functional requirements. Operation department often submits system management or monitoring requirements. Accounting department may submits billing requirements. And in some cases the system has its own inherent non-functional requirements such as performance, availability and other SLA guarantees. In one word, a system architecture always involves quite a lot dimensions. We have to think about all of them so as to get a full picture of the system. However, developers are usually myopic so that they rarely think about other dimensions. After all, when system rolls out, developers have to work closely with operation people to get feedbacks about production system. If developers don't get well prepared, they may end up with getting nothing. Even worse, they will get entangled into operation aspect. Here are some points developers could consider in advance and prepare for.

The first question is how to get production system status?

The common approach is log extensively in the system itself and send notification email when things get abnormal. Simple! But it don't work when the system is down. And another disadvantages is that application level logging only cares about the system itself. How about machine poweroff or disk failure or network outrage?

So we should have an independent and full functional health management system. Usually this system is maintained by operation department. Then there is a gap, social and technical. The social one is that the two department have to cooperate to make system work. The technical one is about how to make existing health management system be aware of the new system. It depends on both sides. The health management system should be extensible so that it can adapt to any kind of new system. Luckily some full functional monitoring systems qualify. And the new system itself should provide health checking interface that would be called by health management system. So far so good. When system goes wrong, the health management system will get notification in the first place. If they can deal with it, developers can sleep well. Otherwise, developers will get busy.

Another important point is that trust should be built between operation department and development department. Developers should add a lens which can view the dimension of operation to its toolbox. Also system administrators should add a lens which can view the dimension of development to its toolbox, because a full understanding about the new system can help them monitor the system more extensively.

The reason why I am aware of operation aspect of system architecture is that it is getting more and more important today. Service has been a buzzword for years. SOA, SaaS, PaaS, Web Services and so on. So how can we measure the quality of service? Yes, SLA(Service Level Agreement). 4 nines availability and 1s response time. That's it. But how can we reach that SLA? It is closely related to operation. So be watchful of it.

UPDATE: Here is a good post on the same topic: monitoring java system, but more specifically.

Monday, March 30, 2009

the lessons of scaling

In the previous post, I make a simple classification on current website architectures according to their technology features. So what we can learn from these architectures? Before any elaboration I will point out most of principles are proposed for scaling-out rather than scaling-up, because scaling-up just don't do well. In contrast, commodity PC cluster is entering the mainstream. Well, Here is my notes.

1. partition

It seems that partition is a quite familiar concept for DBA rather than developers. I mean, almost any fancy database support table partitioning fully or partially. That means big volume table can be sliced into multiple small and manageable tables. What's more than that is sharding. Or we can just call it cross database partitioned table or horizontal partition. Sharding is almost the standard approach adopted by most of today's web2.0 companies. But note that sharding is not out-of-box components provided by database vendors. Instead, anyone who want to use it may end up with building a sharding solution specific to the use case. Although there are some partial solutions like Hibernate shard and Mysql Proxy out there, in most cases you have to customize or treak for your needs.

The essence of partition is divide and conquer. However, this principle also applies very well in the whole software stack. In application layer, we can partition a monolithic system by function into independent system unit or just design such architecture from the start in mind. In SOA terms, well defined self-contained services. Because each unit has its own constraints and characteristics, so each unit can be independently developed and optimized for performance and scalability. For example, In a classic online shop, there may be signing service, item view service, item search service, ordering service, and so on. Each of them has different user experience tolerance characteristic and IO characteristic. We can use different strategy to implement these services for different functional and nonfunctional needs. But one important assumption of function partition is the system is stateless. In other word, service is self-contained. So we avoid server side http session or stateful session bean into a minimum.

2. caching

Caching is a well-known hammer to crack performance problem. Also caching can be applied in many use cases. Database do have its own query result cache. If your server has abundant memory, allocate more of them for database query result cache can make significant impact in response time. Aside from database, developers are more familiar with caching in application layer. There are many caching solutions
for popular web programming languages. The most notable one is memcached which is almost the standard configuration in most web2.0 companies.

3. avoid distributed transaction

This one is also the most provoking and controversial one. However, it has been regarded as an important principle in Ebay's scalable architecture. But this is not a new idea at all. The basic logic behind this principle was proposed as CAP theorem in the earlier 90s by Eric Brewer. The theorem is also called Brewer conjecture.
However, CAP theorem was proved thereafter. So we can make architectural decisions based on this assumption. For most of today's web services, availability and partition are fixed factors. So we have to sacrifice consistency for availability. That usually means ACID properties provided by relational database will not be available anymore. Instead, we end up with a different architecture: BASE. However, we do need eventual consistency in some cases. So we have to introduce other mechanism for compensation and correction to reach eventual consistency. Concrete implementing strategy must be tricky.

4. asynchronous processing

Another proven approach of scaling is identify time-insensitive processes and do it asynchronously. The point is decouple one process from the others and thus each process can be simple and easy to scale and most importantly, without blocking other processes. Sometimes asynchronous principle is called event driven model. That's true that asynchronous processing always involves some kind of notification about the result of processing. Messaging middleware have been widely used for this purpose. Order processing, billing, BI are all in this spectrum. Queue mechanism has been touted as the weapon for (XTP)Extreme Transaction Processing. When high volume load is queued for later processing, the system scales. But behold, the pressure is actually transferred from application to messaging middleware. So if the messaging middleware itself can't scale well, there might be a nightmare.

In addition, some OS and programming language provide asynchronous abstraction. In OS level, Asynchronous IO has been developed in windows(CIOP) and linux(AIO) for better scalability. Some language libraries also provide such abstraction. Java has concurrency package that provide future concept. Boost library has an AIO module for network programming which implemented Proactor pattern. And event based IO in linux like epoll has prevailed with the large adoption of web server like lighttpd. However, in the context of network programming, threading or events is a question. There are some provoking discussions worth a read.Why events bad and Why threads bad. Don't be confused by the title. Read in context.

5. failure oriented

The 8 fallacies of distributed computing are well-known in the distributed system field for years. Actually all of them are false assumptions we are likely to make when designing distributed system. Although some of them seems naive for today's architects, most of them still apply. In other words, when system gets distributed, things get ugly. We have to be prepared for dealing with such ugly things. A good start is list all components of the system, assume each of them is likely to fail in some cases, and figure out how to deal with it in such cases. The more worse the cases you think of and prepare for, the more stable the system would be. There are many researches and practices on the issue. Hardware redundancy, software instance replication for availability, automatic failover, backup and revovery, data replication for error tolerance and so on. In programming language, Erlang really did a good job in this area. Erlang was designed with "failure is everywhere" in mind. Each process can have a monitor process to watch if it is healthy and keep it up in case of failure. However, there is a "who monitor the monitor" problem. That's why we can only build system with several 9.

6. virtulization

Virtual machine is well understood for testing purpose. It is very easy to install several virtual machine images in one psychical machine and test programs for different OS environment. But today virtulization has been leveraged for building large scale cloud computing infrastructure, because it can provide a better abstraction of computing power. User can instantly use it without worrying about networking, power or disk failure. That's the essence of utility computing either. Enterprise can get better utilization rate of its computing resources. For now cloud computing is provided by IT giants like Amazon. But it would be promising to see how cloud infrastructure fits into the enterprise scope.

Note: I will update this post when new idea comes up.

reference:

1. http://highscalability.com
2. http://queue.acm.org/detail.cfm?id=1394128
3. http://queue.acm.org/detail.cfm?id=1466448
4. http://www.infoq.com/articles/ebay-scalability-best-practices
5. http://www.manageability.org/blog/stuff/about-ebays-architecture
6. http://www.manageability.org/blog/stuff/cache-tier-architecture
7. http://www.ccs.neu.edu/groups/IEEE/ind-acad/brewer/sld009.htm
8. www.atomikos.com/downloads/articles/TransactionsForXTP-WhitePaper.pdf

Sunday, March 29, 2009

the challenge of the scale

After the dotcom bubble broke, we gradually get a new one-web 2.0. However, this time it is more fun. From a brief history of web 2.0, we can see the birth of google marked the infancy of this new age of Internet. The most notable feature in this age is collective wisdom. Well, you may say long tail, large scale collaboration, and whatever. The point is the users are the leading role of the stage. So what's the implication of this trend on technology? Users means page view and site traffic. And what's more than that is the scale of traffic and data. How could we deal with this thing? The question is the same, the answer varies from one company to another. highscalability.com made a great contribution for the community to learn from each other.

Since google published some papers on its secret weapons, many companies have disclosed their technology architectures and shared their experience in a variety of talks. I just make a simple classification about these architectures:

1. cloud computing
features: homegrown solutions from scratch for large scale data processing, distributed,tolerant and high available file system; distributed schemaless database/document store; computing grid/distributed job scheduler
example: google, amazon
technology: GFS, Bigtable, MapReduce, Chubby, Dynamo, EC2, S3, SimpleDB

2. LAMP
features: customized LAMP, some homegrown solutions, some clones of class 1
example: yahoo, livejournal, youtube, flickr, facebook
technology: linux, LVS, Apache, Mysql, PHP, Squid, memcached, MogileFS, Perlbal, DJabberd, The Schwartz, Spread, Hadoop, HBase, ZooKeeper, Hypertable,

3. JAVA EE
features: classic N-tier architecture,2PC transaction, application server clustering, db replication, caching/in memory data grid
example: Ebay(Note:maybe ebay is not a good example of this class because ebay don't use 2pc transaction), many banks and security companies
technology: jsp, web frameworks, jee application server, messaging middleware, commercial relational db

4. MS suite
features: N-tier architecture, partition, caching
example: MySpace
technology: Asp.NET, sql server, windows server

It is clear the first two classes of architecture draw much attention these days, partly because open source software has got accelerating adoptions. On the contrary, commercial solutions are more likely to be adopted by those tycoons who can just throw money on everything. Each class of architecture may solve the scaling problem in one way or another. But it is hard to estimate how cost effective each class of architecture might be. On one hand, homegrown solutions may solve the problem more effectively and provide more flexibility, but maybe need more efforts to build. On the other hand, commercial solution may also solve the problem with equivalent efficiency, but must need more money. The key is the architecture must be extensible for new functional requirements and scalable for increasing user traffic.

Tuesday, February 24, 2009

really funny google hacks

Well, about a hour ago I was chatting with one of my friend. Actually we are talking about how to acquire page view statistics on a website which are controlled by third parties. Suddenly I was reminded of the idea that google hack may do the trick. Then after a bit of googling of "google hack", many funny things popped up. One of them is how to view or control a webcam by google. And I also found some guidelines(Taking advantage of technology, and how to search and view free live webcam by google) about this trick. And this one even listed some live webcam available at present. I just paste some of pictures for your curiosity. You can just try some of them. Have fun!

Saturday, February 14, 2009

personality guess from blog

There is an old saying in Chinese(文如其人): The style is the man, or the writing mirrors the writer. It means we can deduce one's personality from his writing. Now technology can verify this saying. There is a website Typealyzer.com doing this verification. Before you browse someone's blog, you can make a guess about his personality and find friends who are similar with you. Have fun!

Monday, February 9, 2009

a hot debate: where should China's foreign reserve go

Editorial: It is well known that China has accumulated the biggest foreign reserve in the past couple of years. As is noted in academics, this big number of foreign reserve is a double edge sword. On the one hand, it represents the financial power of the state. So the government can resist malicious speculation on RMB from some sharks among foreign investors. And also domestic companies can import advanced equipments and technology patents by buying foreign exchange from SAFE(State Administration of Foreign Exchange) within specific limits. On the other hand, because foreign reserve exists in form of foreign government bonds and securities, there must be some kind of opportunistic cost and possibility of increasing the pressure of inflation.

When we can not rely on imports of the US and EU countries, what we can do with our foreign reserve to revive our economy? A hot debate is happening among some Chinese economists. I'd like to brief this debate(一场关于外汇储备用法的激辩).

On the night of Feb. 8, 2009, a hot debate is happening in the north of Ha'erbing. One side is Zhang Weiying, an economists and professor working in Peking University, the other side is Gao Xiqing, the CEO of China Investment Corporation. The focus is whether the foreign reserve should be redistributed to Chinese citizens to expand domestic demands.

Mr. Zhang made an adamant proposal that the big and idle foreign reserve can be partly redistributed to Chinese citizens to increase their purchasing power. And he thought it is helpful for every Chinese citizen to be the holder of the US bonds. Zhang's proposal was strongly refuted by Mr. Gao, who thought that once this accumulated reserve are redistributed to Chinese citizens, the government can't execute central management of foreign exchange, which is considered necessary to resist currency speculation. Mr. Gao said, "the Americans must oppose this proposal. because holding of the US government bonds means support of its credit. If we sell these bonds, the credit of bonds will drop dramatically. So is the expected profit of these bonds. And finally all of the foreign assets we are holding will become paper." Another reason he opposed the proposal is that he thought the citizens would rather deposit received money than consume. Let stimulating economy alone.

Mr. Zhang also defended his proposal. He said only a part of foreign reserve will be used for redistribution so that the national financial security will not be threatened.

Although the defense of Mr. Zhang seems unassailable, the proposal is a bit radical. The doubt of Mr. Gao about what the citizens will do with the money is reasonable. At present, most people don't have protection of social security. Education of children, disease, taking care of old parents are all big concerns for them. If these top concerns would not be relieved, consumption would not come as expected. So my opinion is that part of the reserve can be used in extending social safety net to all of the country so that people can consume without concern.

Sunday, February 8, 2009

the turning point of China

Since the downturn of the US economy, there have been many discussions about whether China will be affected by this credit crunch originated from the US. Now the problem is not whether or not, but how. In the 4th quarter of the last year, many factories and businesses focused on foreign trade were closed and tens of thousands of people are laid off. Annual bonus in most companies were discounted in the last year. Even for those trying to sustain development, business opportunities become rare. It is evident that the China economy are experiencing the shock of financial tsunami. So what we can do to recover?

Actually both foreign and domestic economists and experts noted early that China was developing a very abnormal economy that we can see from the big Sino-US trade surplus. But there was no powerful measures taken to improve the situation. As long as the train of the economy is running, no much attention would be paid. Now the train is slowing down and likely to stop somewhere if nothing will be done. It is a turning point for us to make a transition in the development policy. In the past almost all of the economic accomplishments in China relied on large quantities of exports. And a big foreign reserve is gradually accumulated to the extent that China and the US are in the same boat. In the meantime, the percentage of consumption in GDP have remained low for years. Now we can't depend only on exports anymore. We should shift our focus to domestic demands.

The government has passed a stimulus package plan and some of these money has been invested in infrastructure projects. And countryside is considered the most important area for financial support. Most importantly, an overall social security plan is carrying out to cover all of the people. As long as these measures, long term or short term, are carried out fully, the potential of domestic demands can be unleashed and our economy can become into a more healthy one.