Archive for the ‘migrated from old’ Category

Shell trick: track new tcp connections per second in Linux

Thursday, February 11th, 2010

This little snippet is for when you want to see new active connections per second, not concurrent established, as most tools show you:


C=0; while true; do echo "new connections: $C"; c1=`netstat -s -t|grep "active connections openings"|awk '{print $1;}'`; sleep 1; c2=`netstat -s -t|grep "active connections openings"|awk '{print $1;}'`; C=`expr $c2 - $c1` ; done

Enjoy!

Introducing: Business Engineering, the evolution of Business Intelligence

Thursday, February 11th, 2010

Define intelligence:

Intelligence is a characteristic of thinking, but it is also a thing to be acquired. This substance is different than information. Intelligence is information that has been discovered, processed, and presented in a way that encourages its other definition: disciplined, insightful thinking.

Define engineering:

Engineering is the deliberate, analytical, scientific application of intelligence to the design or modification of a system.

Most data floated as Business Intelligence is more accurately labeled business information. It becomes the substance Business Intelligence when superior tools expose patterns and trends that are actionable.

Business Engineering is the practice of managing decisions based on critical analysis of intelligence about internal and external factors influencing the business.

RescueTime allows businesses to tweak the previously hidden algorithms that drive productivity of workforces. Data is scientifically gathered, and innovatively processed and presented in real time.

Businesses can re-balance work loads, uncover inefficiencies, and identify stalled or unusually successful projects while they are happening. Smart managers can introduce a measure of science into management itself: easily visualized historical information exposes trends one week to the next. Try several workflow processes, prove which one works best for each team.

Ruby for are these chars in this string…

Thursday, February 11th, 2010

I find myself wanting to know if the characters in one string are in another.
For example, if you have a key space for a string key name, making sure the provided key is valid.


require 'set'
def chars_subset_of? checkme, inme
checkme.split(//).to_set.subset? inme.split(//).to_set
end

You could also toss this inside class String to dynamically add the method


require 'set'
def chars_subset_of? other_string
if other_string.class == String
self.split(//).to_set.subset? other_string.split(//).to_set
else # just try
self.split(//).to_set.subset? other_string.to_set
end
end

Appengine: Auccumulate and Rejoin Fragmented Data or Buffer Small Object Floods in Memcache

Thursday, February 11th, 2010

I put this together to solve a problem where contiguous data generated in JavaScript on the browser side needed to be broken into pieces and sent to the server and reconstructed there. The same tool could be used to buffer rapidly incoming small objects to queue for batch inserts for higher performance / less contention on the Appengine datastore.

Jump to the code. I apologize for the CSS failfailfail.

For the “large object out of many small fragments” use case:

The principle is the sender provides some kind of identifier that uniquely identifies the batch. I use a cookie that is generated on page load combined with a counter in the JS that is incremented once for each batch from that client. The cookie was conveniently already there for another purpose.

On the server, the library takes this identifier and uses it as the memcache root key or “groupkey” in the code. Each fragment sent should include the total expected, and the index in the array of fragments of the current fragment. On the server an atomic global counter keeps track of when all pieces are sent. The api allows you to use the library to preserve the array order, or you can just stick order info in the stored value and figure it out yourself when the server gives you the completed array of accumulated fragments. Very simply, the server uses the unique identifier both to store the fragments in memcache and to keep track of the fragment count, which when equal to the expected count raises a Complete exception, allowing you to fetch back an array of all the fragments. Typically this would be done inline with the request that’s sending the last missing fragment.

For the case of queueing up small objects for batch insert into BigTable:

This use case is obvious, and analysis is left to the reader as exercise (smile). Basically, the above concern about order is removed, and the reconstruction is unneeded.


Python DB Logging Handler for Google Appengine

Thursday, February 11th, 2010

NOTE: this is really only for CRITICAL logging you want to persist separately from google’s wrap of logging. You will burn up your quota otherwise. Main advantages: implements logging handler, maintains support for identities. Good for audit trails.

UPDATE: Appears the module level variable magic is not liked by appengine-patch. I switched to that from appengine helper because of hybrid auth convenience. I have updated to remove that part to prevent any module level variable wierdness. Updated code below. Alternative: just stick it inside your django tree.
— end UPDATE

I provide here a module that extends the Python logging framework to allow you to write messages to your Google appengine database. A database LogRecord model, Handler, Formatter, and logger manager are implemented.

( jump to the code )

Since you can’t store write access files on appengine, you have no easy way to separate your logging to different file handlers. The built in logging is nifty in the dev cycle. But a long term store is desireable, with the ability to set different log levels for different parts of your system.

So the goals I set out with were:

  • Persist all the usual log record info, including file, line, module, stack, etc. as available
  • Allow multiple logger identities for easy separation and grouping
  • Don’t break anything about logging module’s expected behavior
  • Provide a convience wrapper for zero-effort use

Now, in principle I’d generally avoid logging to a database, since log records are inherently denormalized, sequentially added, and typically read only after insert. They’re the perfect candidate for a flat file. But that is not an option here. Plus, I have confidence that the Appengine table structure should allow this to perform more or less like a flat file. Caveat: depending how you use this, you could hit your quotas substantially faster.

Quick note: this Handler provides its own formatter, which just shapes the data for the Appengine table. It doesn’t make sense to provide this a format string since it is not serializing the record. Similarly, I didn’t think it useful to trim off the lineno and funcname detail, since you can just select what you want from the table.

Otherwise, you use it just like any other logger from the logging module. When you actually write a message, a db entry is created for you. The default logger, which is created when getLogger() is called with no arguments. You can interact with the logger to set levels etc. via the object, as in
log = Log2DB.getLogger()
log.setLevel(logging.DEBUG).

Records created by this logger set the identity value as ‘Log2DB’. You can filter identities from the log table by setting identity = whatever you want.

For example:

import Log2DB
log = Log2DB.getLogger()
log.error('fun times')

import logging
log.logger.setLevel(logging.DEBUG)
log.debug('unfun detail')

All the above go to the default logger identity, “Log2DB”.
To create another logger to use, say for example you want one just for mailer status, you use the getLogger proxy function.

For example:

import Log2DB
maillogger = Log2DB.getLogger('MyMailLogger')
malllogger.error('oh noes')

try:
  raise HellException
except HellException, e:
  maillogger.exception('Error: %s stack to follow', e)

A record from the above is in the same table as the default log, but has identity = ‘MyMailLogger’. I thought about actually dynamically changing the log record model class name so you would get separate tables, but given the google architecture this has limited perf gain at cost of brittleness and complexity. I may add this anyway as an option.

Later I may paste a django view for managing the resulting records.

here is the code via pastie.org:

this replaces the old version at http://pastie.org/432156
moved _LogRecordHelper to inner class to fix an issue