This is the server room of SSDC Singapore, one of my clients. I worked with those machines for the past 2 years. I love the noise and coldness of server room. I also love the people there. They are very happy and friendly!

none

The lyrics are great and very funny! :D

none

mysqldump can retrieve and dump table contents row by row, or it can retrieve the entire content from a table and buffer it in memory before dumping it. Buffering in memory can be a problem if you are dumping large tables. To dump tables row by row, use the –quick option (or –opt, which enables –quick). The –opt option (and hence –quick) is enabled by default, so to enable memory buffering, use –skip-quick.

Backuping a single table from a database

mysqldump -u -p database_one table_name > /var/www/backups/table_name.sql

Restoring the table into another database

mysql -u -p database_two < /var/www/backups/table_name.sql

none

There are always needs to build E-commerce sites with Credit Card transactions securely and seamlessly. In Singapore market, eNETs is the most well-known company providing payment gateway services. Up to now, eNETS only provide API for .NET and Java platform. In this post, I will show you my solution to integrate eNETS and Ruby on Rails via Java.

Steps:

1. Build JAR file to submit payment info to eNETS, i named it enets.jar. This jar file will return output from eNETS to console in text format. I attached sample program built with NetBean, you can download it here: eNETS. After downloading, you just copy that folder to your NetBean projects folder as shown in below image.

NetBean projects folder

NetBean Projects Folder

2. Follow eNETS guideline, change setting for java security as well as generate merchant.priv.pgp.asc, merchant.pub.pgp.asc.

3. Change config: log4j.properties, NETSConfig.xml

4. Build enets.jar file from source files in NetBean. Right click on project root, Clean and Build.

5. Generate command to execute enets.jar, something like this:

“java -jar #{RAILS_ROOT}/vendor/extensions/payment_gateway/lib/enets/eNETS.jar #{mid} #{tid} #{paymentMode} #{amt} #{currency} #{merRef} #{submitMode} #{merCertId} #{pan} #{expiry} #{stan} #{paymentType} #{successURL} #{successURLParams} #{failureURL} #{failureURLParams} #{notify_url} #{notify_url_params} #{name} #{cvv} #{post_url} #{post_url_params} #{cancel_url} #{cancel_url_params} #{bill_first_name} #{bill_last_name} #{bill_initial} #{bill_addr1} #{bill_addr2} #{bill_coy_name} #{bill_city} #{bill_state} #{bill_zip_code} #{bill_country} #{bill_mobile_num} #{bill_phone_num} #{bill_fax_num} #{bill_email} #{ship_first_name} #{ship_last_name} #{ship_initial} #{ship_addr1} #{ship_addr2} #{ship_coy_name} #{ship_city} #{ship_state} #{ship_zip_code} #{ship_country} #{ship_mobile_num} #{ship_phone_num} #{ship_fax_num} #{ship_email} #{shopper_ip_addr} #{product_format} #{product_details} #{gw_url}”

6. Run enets.jar from ruby console with output = %x[#{command}]. %x[] command will store output to output variable for later processing. It is not the same as system() or exec() ruby command. Read more on Jay Fields’ blog:  Ruby Kernel system, exec and %x

7. Parse results returned from enets.jar and continue with your business logic in your ruby/rails application. For rails project, i recommend you to use Active-Merchant and modify Bogus payment gateway so that you will follow format of ActiveMerchant framework. You will be supprised because the effort required  is very little.

What are your solutions to integrate with eNETS from Rails project? I would like to know if you have better solutions. Thank you.

none
Some benefits compare to RDBMS (MySQL, Postgres, Oracle …) I come up
with after few months using MongoDB and Neo4J.* No model caching thanks to high performance
* No join thanks to embedded docs or graph DB
* Data-to-object matching is dead simple thanks to no-join and document-oriented
* No SQL injection attack
* No DB migration thanks to schemaless
* Scale horizontally
* Social computation made easy with graph DB
* No DB roll-up thanks to high volume and cap-collection

Q: How do I do data query & tabulation without SQL?
A: I do data query via MongoDB indexing & JSON query syntax and do
data tabulation via Map-Reduce.

none
You also program in Ruby, right? Three languages! Can you contrast the 3 languages Scala, Clojure and Ruby?
That’s a very interesting thing to think about, because Clojure and Ruby have sort of a similar feel in the sense that they are both more dynamically typed than statically typed. Clojure has an interesting relationship with Java, obviously, because it lets you use Java objects, but it doesn’t have type annotations all over the place, like you would have in Java or Scala. I think this general debate of static versus dynamic typing is kind of pointless in some sense, meaning that a lot of times it’s the application that really should dictate what’s best.If you are building something like a typical website that may need a lot of iterations very quickly and there is an informal model of the domain, then maybe it’s not so important to have the formalism of type theory. But, on the other hand, if you are building something that you wanted to behave in a mathematically precise way, then it’s great to have it. They type system of statically language that bakes in the almost provably correct behavior, at the fundamental building blocks. For example, if I’m building a financial application that manages money in some sense, I’d be more likely to want a statically typed language like Scala where I can very precisely specify the behavior of money.

Then, build my account objects and so forth on top of that, knowing that they will be robust at this very fundamental level. But, if I’m building a website, where users may be specifying withdrawals and transfers, I don’t necessarily care about that kind of type safety at that level. I would like to have the dynamism, the productivity that I get from a language like Ruby, so I’d be more likely to use Ruby on that part of the application. I could very easily see JRuby with Rails running the website and Scala or Clojure business to your code that’s handling preciseness of getting money transactions right.

JVM seem to be a place programming languages converge.

jRuby on Rails running web apps + Scala / Clojure / Java handling money transactions and other high reliability, high performance stuffs is very compelling.

none

About The Neo4j Graph Database

Neo4j, the world’s leading graph database, stores data in graphs rather than relational tables. This makes Neo4j especially suitable for applications that handle data with complex relationships, like social networks, life sciences, intelligence and financial applications. Neo4j offers users:

  • extremely high performance on deep traversals and mining of complex data,
  • rapid schema evolution for changing business requirements, and
  • simplified development through perfect match between domain model and database schema.

These advantages make Neo4j the most effective database choice by many social networking services and other applications that manage ever more complex business data.

* Reading Linked
* Exploring Neo4j
* Thinking of using Neo4j in one of my current project

none

Scott: As you mentioned, there is an Apache version of Hadoop and then there’s the Cloudera version. As different companies wrap themselves around different open source projects, they’re structured in different ways. Talk a little bit about Cloudera and what you add to the public open source version of Hadoop, in terms of additional software, support, or services.

Amr: I should start by saying that Cloudera is an enterprise software company. Open source is an enabler for us, and it’s part of what we do, but our mission is about building enterprise software for large-scale data processing in internal or external clouds.

none

The group at Yahoo! that I came from was using Hadoop for data analytics and data warehousing. We had something like 100,000 web servers across the world, and once we collected data from across all these servers, we dumped it into Hadoop, which became the place where we stored all of the data, instead of traditional network storage.

Our reasoning for doing that was a matter of economics, given the quantity of hardware. Hadoop lets us scalably process that data, clean it up, and normalize it so we could pass it along to the systems that need it.

Hadoop is getting very wide adoption in the data warehousing and business intelligence domains. One of the biggest uses within Yahoo! right now is dealing with all of the log information from servers. Analyzing that information allows for better spam filtering, ad targeting, content targeting, A/B testing for new features, et cetera.

It’s not web-specific. For example, everybody does data warehousing, and we see very strong adoption there.

Separate from that, your example of oil companies is a very good one, as is the financial sector. Right now, we do have a couple of very large financial institutions working with us on these exact problems, taking huge amounts of data from domains like credit card processing and building predictive models for fraud that enable better decisions, for example, about whether to block or allow a given transaction.

In the stock market, Hadoop is being used to do simulations that help predict option pricing and related problems. That’s another very healthy market that we’ve seen growth in.

Knowing that Yahoo is the biggest contributor and adopter of Hadoop and the company is used Hadoop to solve various problems from data analytics and data warehousing: log processing, gene sequence mapping (basically a fuzzy string matching problem) to business intelligent domains: financial, stock market …

Rumor said that a bank in Singapore invest millions of dollars create a computing and predicting system from scratch using Haskell – a static type, functional programming language to warranty scaling and performance.

I wonder why the bank did not take a look at Distributed File System (DFS) + MapReduce (Hadoop is an open source implementation of it) as a massively scalable on commodity hardware that successfully utilized at biggest IT firms in the world (Google, Yahoo, Facebook … just to name the few) … or they just re-implementing DFS+MapReduce themselves :D

none

First, it’s worth making the important clarifying point that Hadoop is not a database. Hadoop is a data processing system, and in fact, I would even go as far as saying Hadoop is an operating system. The core of an operating system boils down to a file system, the storage of files, and a process scheduling system that runs applications on top of these files.

There are many other components that help with devices, credentials and user access, and so on, but that is the core. Hadoop is exactly the same thing. The core of Hadoop is the Hadoop Distributed File System, which is a file system that’s runs across many nodes. It links together the file systems on many local nodes to make them into one big file system. Hadoop MapReduce is really the job scheduling system that takes care of scheduling jobs on top of all those nodes.

That is the key distinction between Hadoop’s approach and that of database systems. Hadoop, at its heart, does not require any structure to your data. You can just upload files directly from anywhere, like a web server, RFID device, or cell phone mobile device, directly into Hadoop.

They could be images, videos, or just a bunch of bits. They don’t have to have a schema with column types and so on, which gives you tremendous agility and flexibility.

Hadoop has a very nice model that I sometimes refer to as schema on read. Whereas defining your schema as you’re writing the data in limits what you can put in by requiring it to be conformant to the schema that you created, Hadoop allows you to define the schema as you’re reading stuff out.

That gives you a lot of flexibility and agility, since you can add files that have dynamic parts like JSON or new standards coming up like Avro, which is a very good project coming out of the Hadoop project that’s similar to protocol buffers from Google and Thrift from Facebook. Avro makes files have a schema around them as well, but these schemas are semi-structured, rather than conforming to a strict relational model.

That said, it’s also important to point out that structured stuff is a subset of unstructured stuff. The fact that Hadoop at its heart is a file system doesn’t mean that it can’t do database relational stuff. It does actually, in the same way that Windows at its heart is a file system, but you can run SQL Server on top of it to get the relational services, schemas, column types, and so on.

One of the key projects on top of Hadoop is Hive, which actually came out of Facebook. Hive essentially provides a relational database on top of Hadoop that utilizes the underlying file system but has a metastore that keeps the schema of the files.

It knows that a given file is tab delimited or whatever, it knows the column type for these files, and Hive allows you to write SQL against these files. It will look up the schema and then it will write for you the MapReduce jobs so that you don’t have to go and learn MapReduce from scratch.

Now you have the flexibility of going either way. One approach is to get at the core of the MapReduce framework using Java MapReduce, which we sometimes refer to as being like assembly language for Hadoop. It gives you the most flexibility and performance, but it is fairly complex and difficult to learn.

Alternately, you can go in with a high level language like Hive. In this case, you can just use SQL, if that’s what you’re used to, to write your job. Hive itself has lots of optimizations. It understands the underlying MapReduce framework, so it can properly map your problem on top of your data.

none

More Information