java array iteration

Java5 now has language support for iteration of the form:

for (Type var : someIterable) { ... }

As well, there is now an Iterable<t> interface, and Arrays are directly iterable, allowing you to write:

String[] thingys = {"a","b","c"};
for (String thingy : thingys) { ... }

At work, we have a collection of utility iterators, most written before these were available. As such, we have an ArrayIter utility, and a ZipIterator, inspired by Python’s itertools.izip.

I’ve been going through these classes and their usages on a lazy basis to update them to the new syntax. I finally got around to a usage of the ZipIterator, which happened to compose an ArrayIter … it zipped together an array of String names with the results of a test.

So, I changed it to:

String[] names = { "foo", "bar", "baz" };
List results;
for (Object[] pair : new ZipIterator(names, results))
{
// ...

No dice, says Java:

/home/jsled/stuff/work/[...]TestMumble.java:46: cannot find symbol
symbol  : constructor ZipIterator(java.lang.String[],java.util.List)
location: class com.spokesoftware.util.iterator.ZipIterator
for (Object pairObj : new ZipIterator(data, results))

WTF? Okay, let me help you out:

for (Object[] pair : new ZipIterator((Iterable)names, results))
// ...

FUCK YOU, says Java:

/home/jsled/stuff/work/[...]/TestMumble.java:46: inconvertible types
found   : java.lang.String[]
required: java.lang.Iterable
for (Object pairObj : new ZipIterator((Iterable)data, results))

It turns out that the string “iter” isn’t even in the text of the section about Arrays in the Java Language Spec. Array instances aren’t Iterable. They’re a special case in the handling of the for-each loop syntax.

This code ended up as:

for (Object pair : new ZipIterator(Arrays.asList(names), results))

Java is totally shitrude.

gnucash-2.0.0

After much sporadic work, testing, updates, frustration and hacking, GnuCash 2.0.0 was released last night. Now the real fun can begin. :)

working around java class/memory leak in GATE 3.0

I’ve developed some patches to GATE to allow an application to work around memory issues resulting from using JAPE in a long-lived VM.

The problem stems from use of the GateClassLoader and the dynamic creation of classes in loading JAPE grammars. Generally, classes are loaded into the Permanent Generation section of the heap, and are only eligible for garbage collection when the classloader from which they were sourced is itself unreachable. As the GateClassLoader is never reset or reinitialized, over time the classes will accumulate and fill the heap, and cause OutOfMemory errors in VM. While a first attempt to resolve this would be to cache the loaded JAPE objects, there are still classes created on each use of that grammar, which will slowly cause the same problem.

The caller needs a facility to reset the class loader, as provided by the patches to src/gate/Gate.java (and the compiler classes). It is expected that the caller invokes Gate.resetClassLoader() periodically, either on a fixed frequency or after a set number of uses of the library; our application resets every 15 minutes.

One complication is the difference in time between the generation/loading of the classes for a particular JAPE grammar and the instantiation of objects of those classes. Obviously, the objects needs to be instantiated from a valid class, and in particular from the classloader from which they were created; if the classloader is reset in the middle, there will be instantiation exceptions thrown when using a different classloader. A change was thus made to src/gate/jape/RightHandSide.java to retain a reference to the relevant GateClassLoader. Note, that there’s still a race condition present, but the window of time is significantly less than before.

Our (application-level) reset operation resets both the Gate classloader as well as invalidating our JAPE cache: current threads will finish using their JAPE objects, references to the classloader are eventually lost (from gate.Gate via reset and the jape RightHandSide objects through reference), and the garbage collector is allowed to collect the whole shebang.

This patch is much more of a workaround than a fix for the underlying problems. I don’t care if these patches are applied, so much as they’re useful in demonstrating the problem so it can be more appropriately fixed. Really, though, JAPE shouldn’t need to create classes per use.

Also, while I’m at it, here are patches to:

Update, 2006-07-10 I was pressed for detail about my claim that JAPE creates classes on use, and found that I couldn’t. Here’s my reply to gate-users:

I went to reproduce this, and found I can’t. Now I believe I mis-spoke.

The classes I remember being created were…

japeactionclasses.postprocessNewLineActionClass[#] japeactionclasses.postprocessVBnegActionClass[#] japeactionclasses.postprocesssimpleJoinActionClass[#]

…which are from the tokenizer’s grammar. Thinking back to the period I was debugging the issue, I don’t believe I was caching the tokenizers, but I had instituted the other JAPE caching. As well, I didn’t remember that the tokenizer was implemented in terms of JAPE. As such, I tricked myself into believing that the simple use of JAPE grammars from a different processing step was causing the loading of the aforementioned classes, when really it was just not caching the tokenizers.

Logging in JSON

I primarily do backend processing at work, often dealing with extended processing that doesn’t naturally have a user-interface. As such, I’ve been finding ways to provide quick and convenient interaction with the processes.

One simple obvious way is logging, but I found that text-based statement logging doesn’t get me quite the level of detail and structure I want and ultimately need to be productive. When I was playing with RDF more heavily earlier last year, I started to format my log messages as N3 pseudo-statements:

** 2006-01-25 12:34:56 INFO Thread-0 Processor: [ object “foo”; count 12; disposition “thumbs-up” ] ** 2006-01-25 12:34:56 INFO Thread-0 Processor: [ object “bar”; count 7; disposition “thumbs-down” ]

Of course, the “real” form of this would be to have the entire logging pipeline emit actual N3 statements, including the logging-provided (meta-)data:

[ # a log:Entry log:ts “2006-01-25 12:34:56″ ; log:lvl log:INFO ; log:thread “Thread-0″ ; log:class “Processor” ; log:msg [ :object “foo”; :count 12; :disposition “thumbs-up” ] ].

N3’s great because of its simple syntax for representing structured data, but I’ve recently started using JSON instead, which has a more straightforward syntax and semantics for loose, non-RDF data representation. If you’re unfamiliar with JSON, it is a subset of javascript representing dictionaries, lists, and the basic data types (integers, floats, booleans and strings). An example would be:

{”favorite int”: 42, “floating value”: 42.01, “this is an example”: true, “a list of things”: [42, 42.01, true, {foo: bar, baz: quuz}]}

For example, I’ve been developing a content parser, and running it against a fixed set of test cases to assess the impact of changes I make. Each test result consists of 4 parts:

  1. the test-case label (the input filename)
  2. the expected-and-produced pairs
  3. the expected-but-missed cases
  4. the unexpected-but-produced cases

Each test-case and result is on the order of 10 “things”, so they’re well suited to manual review. The test run outputs a set of log statements like:

** 2006-01-25 12:34:56 INFO Thread-0 Testing: {”case”:”/path/to/test/case/file.name”,”expectedProduced”:[[«expected object», «produced object»], «…»], “expectedMissed”:[ «…» ], “unexpectedProduced”:[ «…» ]} ** 2006-01-25 12:34:57 INFO Thread-0 Testing: {”case”:”/path/to/test/case/file.name”,”expectedProduced”:[[{”foo”:”bar”}, {”foo”:”bar”}], «…»], “expectedMissed”:[{”foo”:”baz”}], “unexpectedProduced”:[{”foo”:”quux”]}

The process of logging general objects is made much easier by using a reflection-based utility class I wrote that will json-stringify random objects thrown at it, but that’s for another post…

A nice feature of JSON – being a subset of javascript – is that it is trivially parsed in javascript with an eval statement. As such, it’s simple to write an HTML page that accepts pasted JSON and renders it as nested tables. As this accepts and evaluates arbitrary javascript, I won’t link to it, but instead present the (mostly) short page here:

json:


decoded:

This can be copied and pasted into a local HTML file; it runs entirely locally. Alternatively, uncompress this compressed version.

moving GnuCash from CVS to SVN using cvs2svn.py

I was responsible for moving the GnuCash CVS repository over to Subversion, which was an interesting experience.

The basics are covered in all the applicable documentation, though there’s are some things I distilled out, primarily around cvs2svn.

cvs2svn is seperately available from http://cvs2svn.tigris.org/ but has not-quite-direct docs about how to do what I’d imagine is a pretty straightforward thing: dump multiple CVS modules into seperate top-level SVN directories following the recommended “module/{trunk,branches,tags}/” convention. It turns out, from the cvs2svn FAQ, the incantations are:

  • ~/cvs2svn-1.3.0/cvs2svn [options below] --dump-only --dumpfile=moduleA.dump /path/to/cvs/repo/moduleA
  • svn mkdir file:///path/to/svn/repo/moduleA
  • svnadmin --parent-dir moduleA load /path/to/svn/repo < moduleA.dump

There were two sets of options that we needed to add: the first around symbol renaming, and the second dealing with binary files.

The list of symbol transform arguments to cvs2svn to do branch- and tag-renaming was:

–symbol-transform=’gnucash-([0-9]+)-([0-9]+)-branch:\1.\2′ –symbol-transform=’gnucash-docs-([0-9]+)-([0-9]+)-branch:\1.\2′ –symbol-transform=’gnucash-([0-9]+)-([0-9]+)-([0-9]+)-rc:\1.\2.\3-rc’ –symbol-transform=’gnucash-([0-9]+)-([0-9]+)-([0-9]+[a-z]+?):\1.\2.\3′ –symbol-transform=’gnucash-([0-9]+)-([0-9]+)-(.*):\1.\2.\3′ –symbol-transform=’gnucash-docs-([0-9]+)-([0-9]+)-([0-9]+):\1.\2.\3′ –symbol-transform=’root-of-gnucash-([0-9]+)-([0-9]+):\1.\2-root’

Turns out some of the historical CVS gnucash files aren’t appropriately tagged binary, and for some reason cvs2svn or svn itself wasn’t really doing the right thing. As such, I needed to override the mime-types and eol-handling with the options:

–no-default-eol –eol-from-mime-type –mime-types=/root/test.mime.types

It turns out that you just have to use all these options together, basically. /root/test.mime.types was cobbled together by grepping the image/* types out from the normal mime.types, then extending the application/octet-stream entry for some gnucash-specific extensions. Specifically:

application/octet-stream bin dms lha lzh exe class so dll xac gmo

Apart from that, cvs2svn does all the work, here. After running the svnadmin import, the dumpfile for moduleA will be imported into the directory ‘moduleA’ in the SVN repo, and Bob’s your uncle.

jython ui shell, async map

From time to time at work I use a little swing Jython shell to do various and sundry interactive tasks against our system … either poking at vm-local code in development or against one of the runtime development or staging instances. It’s quite useful when it is.

One thing I often find myself doing is processing some chunk of data, which sometimes can take either a good chunk of — or unknown amount of — time. The simple implementation of my little jython shell blocks the UI while the python is processing, which is less than ideal for long-running jobs. I’ve done various ad-hoc threaded solutions for this before, but finally generalized it today.

AsyncMap functions like map, but asynchronously. As it applies the given function to the elements of the given list, it will update a JLabel with its position in the list, as well as the last- and average-times taken processing the elements.

The one potential dependence on my scenario it has is that the global win is expected to contain the JFrame of the UI; but if you supply it another JLabel already placed in the UI appropriately, &c., it’ll happily use that.

from java.lang import Runnable class AsyncMap (Runnable): ”’asynchronous map; reporting position and time-stats to given ui label.”’ def init(self, fn, inList, label = None): from javax.swing import JLabel from java.lang import Thread self.fn = fn self.inList = inList self.outList = [] self.errs = [] if not label: label = JLabel(’— uninit —’) win.contentPane.add(label) win.pack() self.label = label self.thread = Thread(self) def start(self): self.thread.start() def run(self): from java.lang import System totalTime = 0 self.label.text = ‘0/%d’ % (len(self.inList)) for idx,item in [(i,self.inList[i]) for i in range(len(self.inList))]: start = System.currentTimeMillis() res = None try: res = self.fn(item) except: self.errs.append(”error at %d” % (idx)) end = System.currentTimeMillis() self.outList.append(res) time = end-start totalTime = totalTime + time avgTime = totalTime / (idx+1) self.label.text = ‘%d / %d - last: %f, avg: %f’ % (idx+1, len(self.inList), time, avgTime) self.label.text = ‘DONE: %s’ % (self.label.text)

hotswap changed java code into a JVM with hotswap.jar

I’ve found an ant task and java utility code for hotswapping code into a JVM using the JPDA debugging interface.

Though it requires a pretty recent version of ant, it does work really well … within some constraints of the JVM/JPDA interface itself. Specifically, you apparently cannot add, remove or change the signature of methods via said interface … so whole-sale changes to the program code aren’t going to work. But, for a wide variety of minor changes and tweaks, hotswap.jar is incredibly useful.

Nevermind the bits about constructing timestamps in his instructions … touch a sentinel file to keep track of load-times, and use ant’s built-in depend mechanism. For example, with a server in server/ depending on shared utility code in common/ and per-module build-output going into server/build/java, common/build/java, &c…

software development in 2005

It’s really frustruating to write software, sometimes. Computers are exceedingly literal, and true abstraction is quite hard to come by. Adding documentation, written by wonderfully-silly humans,doesn’t always help, either.

For instance, Struts has a nifty feature for “map-based form properties”. Basically, if you write in your page:

<input type=”text” name=”nameMap(foo)” value=”bar” />

And when submitted, form.setNameMap("foo", "bar") will be called. Quite nice when you have a lot of dynamic content in the page and you’re trying to figure out how it needs to be related at form-submit time…

In fact most of Struts’ handling of forms is simple bean-addressing strings… you can create “nested” paths with ‘.’ (e.g., “person.name.first“), and do indexed properties with ‘[]’ (e.g., “names[42]”).

So, what happens when you want to use

<input type=”text” name=”nameMap( [ :name "joe bob"; :email "root@127.0.0.1" ] )” />

…? That’s an expression repressenting a single mapped-property call, with a string argument, right?

Ha Ha, no.

Turns out that the simplest thing that could possibly work here is to do String.indexOf() calls with the various delimiters. So the first thing that hits is the ‘.’ in the email address. Should that not have been there, the ‘[’ would be next up. Of course, neither are at all right. And, of course, in traditional programmer style, the documentation is perfectly willing to make believe that the expression syntax is robust and well-behaved.

A request to other programmers: any time your “parser” is based around calls to indexOf, then you need a big “THIS IS A HACK PARSER” disclaimer in your documentation, so people will give it this distance it deserves.

Update, a few minutes later:

It also does not help that the entire error you get when you make this mistake is…

HTTP ERROR: 500 BeanUtils%2Epopulate

RequestURI=/secure/dedup/un-c11n.spoke

…in the browser, and the misleading…

Caused by: java.lang.NoSuchMethodException: Property ‘namePerson’ has no setter method

…in the exception. Thanks, BeanUtils, for perhaps having the worst error messages evar.

jde-usages

I wrote a while ago about my stint of using IntelliJ and returning to emacs. More recently, I got an email from Suraj Acharya, saying that I should try jde-usages, which provides a set of extended functionality around method-usages and -overriding, and detecting the callers of a particular method. It also provides a jde-open-class-source-with-completion function, which is awesome — very close to IntelliJ’s behavior. It’s my new binding for C-c n.

jyWebShell

After my efforts Friday to get jython working in swing, I set upon my real goal: HTTP/HTML access to a server-side jython session. The idea is that the jython session runs in the app-server container, and thus has access to the full runtime configuration and capability of the system. It is exposed to the user[s] in a durable yet dynamic manner.

I’ve packaged up the results as jyWebShell 1.0. At 5MB, it’s a bit larger than a traditional .war file since it contains the full Jetty distribution, as well as Struts and jython.

I use the XMLHttpRequest object in the UI, which submits just the current python command, and updates the page in-place with the result of that evaluation; however, if the XMLHttpRequest doesn’t exist, a normal POST is done, and will work just fine.

I’ve also got a short Flash demo [read: “screencast”] of it in action; no audio, unfortunately … a project for another weekend.