SQLAlchemy Adds Objects to Collections Automatically

So I’ve been using SQLAlchemy for a while for Chukchi. Here’s a WTF I’ve got recently.

Imagine you have a basic foreign key relationship in your SQLAlchemy declarative models. For example, let’s take one from Chukchi’s models.py lines 83-121:

class Entry(Base):
    id = Column(Integer, primary_key=True)
# <...>

class Content(Base):
    id = Column(Integer, primary_key=True)
    entry_id = Column(Integer, ForeignKey(Entry.id), nullable=False)
# <...>
    entry = relationship(Entry, backref='content')

(note that I’ve skipped irrelevant parts)

Now let’s import and play with it for a bit:

>>> from chukchi.db.models import *
>>> from chukchi.db import Session
>>> db = Session()
>>> entry = db.query(Entry).order_by(Entry.id.desc()).first()
INFO sqlalchemy.engine.base.Engine SELECT <...>
FROM entry ORDER BY entry.id DESC 
 LIMIT %(param_1)s
INFO sqlalchemy.engine.base.Engine {'param_1': 1}

(note that I’ve got echo=True enabled so that we see all the SQL)

Now let’s see what child objects are there in the database:

>>> print ", ".join(str(c.id) for c in entry.content)
INFO sqlalchemy.engine.base.Engine SELECT <...>
FROM content 
WHERE %(param_1)s = content.entry_id
INFO sqlalchemy.engine.base.Engine {'param_1': 2662}
25223, 25224

Okay, so we have an Entry object and two Content objects that refer to it.

Now let’s create a new Content object:

>>> c = Content()
>>>

As you probably know, SQLAlchemy executes write queries whenever the session is flushed. But if we flush the session now, nothing happens:

>>> db.flush()
>>>

That’s because the state of our new object is “transient”. As long as we don’t add it explicitely to the Session, it won’t be inserted into the database.

Now let’s create a Content object with some data filled:

>>> c = Content(entry=entry,
...             type='text/plain',
...             hash="doesn't matter",
...             data='')

Now what would happen if we flushed the session now? Nothing, right? Wrong!

INFO sqlalchemy.engine.base.Engine INSERT INTO content (<...>) VALUES (<...>) RETURNING content.id
INFO sqlalchemy.engine.base.Engine {'hash': "doesn't matter", 'entry_id': 2961, 'type': 'text/plain', 'data': '', <...>}

This happens because when you initialize a relationship with a persistent object (that is, an object within a Session), it gets added to its parent object’s backref collection:

>>> print ", ".join(str(c.id) for c in entry.content)
25223, 25224, 25225
c = Content(entry=entry)
>>> print ", ".join(str(c.id) for c in entry.content)
25223, 25224, 25225, None

(new Content object having no ID yet)

Interestingly enough, this doesn’t happen, if you initialize the table field, instead of a relationship:

>>> c = Content(entry_id=2961)
>>> print ", ".join(str(c.id) for c in entry.content)
25223, 25224, 25225, None

So watch out if you initialize a relationship attribute: you might get some new unexpected objects in your database.

0 comments »

Free PDF Readers

You may want this link next time you look for a PDF reader on a system where there is none: pdfreaders.org

0 comments »

This client is too old to work with working copy

Did you know, that in subversion you get this error message if you create a working copy with a new client and then try to do something with it using an old client? Seems screwed up but pretty reasonable, doesn’t? But get this: subversion will update your working copy even if you do something like “svn diff” or “svn ls” there with a newer client. And then your older client will bail out.

That’s pretty damn fucked up. That’s one of the design decisions you should not be allowed to make. I mean, subversion is actually pretty dumb by itself. So what was so special needed to be changed in those crappy .svn directories, so that old clients cannot understand it?

By the way, if you accidentally came here googling the error message: sorry, you’ll have to deal with this shit until you move to a less crappy version control software.

0 comments »

Aeacus: Reloaded

I’ve cleaned up my Gentoo repository “Aeacus” and published it again. Feel free to use it, comment upon it, edit it, or whatever. Just add this URL to your /etc/layman/layman.cfg under “overlays”:

http://dev.hades.name/file/Aeacus/layman-list.xml
0 comments »

Gentoo Install Checklist

If you want to install Gentoo you should read the Gentoo Handbook. However, if you (like myself) have already installed Gentoo for a couple (thousand) times, you know more or less everything that has to be done, but just need a list of things not to forget. So here it is:

  • fdisk # create the partitions you want
  • mkfs # create the root filesystem
  • mount root /mnt/gentoo
  • cd /mnt/gentoo
  • wget -O- ftp://mirror.switch.ch/mirror/gentoo/releases/x86/current-stage3/stage3-i686-*.tar.bz2 | tar xj
  • cd usr
  • wget -O- ftp://mirror.switch.ch/mirror/gentoo/snapshots/portage-latest.tar.bz2 | tar xj
  • cd /mnt/gentoo
  • mount -t proc proc proc
  • mount --bind /dev dev
  • cp -L /etc/resolv.conf etc/
  • chroot . /bin/bash
  • env-update; source /etc/profile
  • emerge --sync
  • eselect profile set default/linux/x86
  • vim /etc/make.conf # setup USE, CFLAGS, CXXFLAGSMAKEOPTS
  • vim /etc/locale.gen # setup locales
  • locale-gen
  • emerge gentoo-sources # configure, build and install the kernel
  • vim /etc/fstab # setup the partitions, don’t forget root, swap
  • vim /etc/conf.d/hostname # setup hostname
  • vim /etc/hosts # add the hostname to localhost line
  • passwd
  • vim /etc/conf.d/clock; emerge --config timezone-data # setup timezone
  • emerge syslog-ng vixie-cron mlocate
  • rc-update add syslog-ng default
  • rc-update add vixie-cron default
  • emerge reiserfsprogs xfsprogs jfsutils e2fsprogs # or whatever set of FS you have
  • emerge dhcpcd
  • emerge grub
  • vim /boot/grub/grub.conf # configure Grub
  • grep -v rootfs /proc/mounts > /etc/mtab
  • grub

Now reboot and smell the ashes!

0 comments »

Your Own Git Hosting: Gitolite

So, let’s start with setting up gitolite to host your repositories and provide SSH access to them. There is also a tool called gitosis, but it is not as supported and featurefull as gitolite.

So what does it actually do? In essence, all the developers that will access your repositories will login to your server as user “git”, run server-side git daemon and sync data with it. Gitolite will determine if a user has access to perform operations he is going to. Ok, but why bother with all this crap, and not just create a bunch of users on the server and tell them where is the repository? There is a couple of reasons.

Firstly, you may not want to give your developers shell access to your server. Why wouldn’t you? Who knows, probably because they are stupid or malicious, or whatever. With gitolite they won’t get a shell access, because they will only be able to login with their SSH keys and that SSH keys will be restricted to invoking gitolite only. At this point you may want to read up SSH manual on keys and command restrictions, if you do not yet know about it.

Secondly, you may want to impose certain access restrictions on them. Gitolite allows to restrict read write and rewrite down to per-branch level.

Thirdly, with local multiuser git access you have to fiddle a little with umasks or write a hook that fixes the permissions, but that’s unimportant and uninteresting.

Fourthly, you may even not have a root access. That’s right, gitolite allows you to setup multi-user repositories on systems where you have only one non-root account.

So let’s go ahead with installation. The INSTALL doc is rather straightforward and you should in fact read it instead of this blog post. I am not offering the text below as a setup instructions, but merely as a log of how I installed it on my server.

First off, we require some git on our server. Please be careful to use as fresh git as possible, because it generally tends to gets even more super awesome with time. Also gitolite requires at least git 1.6.2 at the time of writing of this post. Unless you use especially sucky distribution (ahem debian ahem), you should be fine with git from your repository. I just told Portage to install me one:

    emerge -av git

Although gitolite has an ebuild in Gentoo, I set it up manually and will walk you along the procedure. First, lets obtain the sources:

    git clone git://github.com/sitaramc/gitolite.git

This will create a “gitolite” directory with gitolite sources. Let’s create a system directory for gitolite. I use /opt for all non-Portage packages, so let’s go ahead with:

    GITOLITE=/opt/gitolite
    sudo mkdir -p ${GITOLITE}/{bin,conf,hooks}
    sudo chown -R `whoami` ${GITOLITE}/{bin,conf,hooks}

You may of course use another set of directories if you want. Next, we let gitolite install itself wherever we want it:

    ./src/gl-system-install ${GITOLITE}/{bin,conf,hooks}

This takes three arguments: directory for binaries, directory for configs and directory for hooks. Also this is the same command you would use to update your gitolite installation after you git pull the new sources.

We would now require a user account for git. Let’s create them!

    sudo groupadd -r git
    sudo useradd -d /srv/git -g git -m -r -s `which bash` git

This creates a system (-r) group git and a system (-r) user git with home directory (-d) /srv/git and shell (-s) bash, in group (-g) git, and creates its home directory (-m). Now login into your new git account. Don’t forget to add your ${GITOLITE}/bin directory to the .bashrc if it is not in the PATH:

    sudo su — git
    echo PATH='${PATH}':/opt/gitolite/bin/ >> .bashrc
    source .bashrc
    which gl-setup

Now copy over your public key to a file called username.pub and finalize gitolite setup by running:

    gl-setup username.pub

After that, follow the instructions, they should be fairly straightforward. Congratulations, you’re done! The gitolite’s configuration is stored in a Git repository under gitolite, so you’d want to clone it from your computer (i.e. where you have the private key for username.pub):

    git clone git@yourservername:gitolite-admin
    cd gitolite-admin
    vim conf/gitolite.conf

The file gitolite.conf is in fact your gitolite config file. After you edit it, save it, commit it and push to the gitolite-admin repository on your server, gitolite checks the config and makes appropriate changes to the repositories. To add users, just place their keys into keydir directory. It’s that simple! You probably would want to read official admin doc just about now.

0 comments »

Your Own Git Hosting: Prologue

As you might know, Git itself does not deal with collaboration, letting you choose any adequate model. So how does one go about publishing his repository? Or letting someone push into their repository? The answers are multiple and confusing. Official Git Manual gives a dry summary of the basic options, which you may read now, or at your leisure. I’ll just note the few key points.

Limiting the discussion to network exchange, Git supports three basic protocols: git protocol, HTTP and SSH. All of them support pulling and pushing, however each of them has drawbacks. Git protocol does not support any kind of authentication, so it is mostly used for read-only public access. HTTP is slow and stupid (which means that it allows only file transfer, so you can’t use Git hooks). SSH is great in all respects, but it implies that every person that has access to your repository has an account on the server, where it is hosted.

In the majority of the cases, two protocols are set up for Git repository: public read-only git protocol access and private read/write SSH access. However I can name you upon request at least one real-world case, when git protocol was successfully used for unauthenticated read/write access :). HTTP protocol is usually used in those morbid cases where you can in no conceivable way get rid of it (imagine a firewall that blocks everything except port 80 or 443).

So how do you setup a Git hosting? There are a lot of web Git hostings (gitorious.org, github.com), and project hostings with Git support (sf.net) out there. Mostly they provide free hosting for open-source projects and charge a fee for private repositories.

However, public Git hostings can not satisfy everyone. There are closed projects, that could not be trusted to a third-party. There are closed projects that do not have spare money for Git hosting. Also, it is hard to integrate hosted Git repository with other development tools (buildbot, bugzilla, etc.), mainly because custom hooks are not supported. So that brings us to the following problem.

Given:

  • a server.

Required:

  • hosting of any number of Git repositories;
  • public read-only access to a subset of them via git protocol;
  • read/write access for authorized users via SSH;
  • web interface to the repositories.

In the following posts, I will tell you how I solved the problem for myself using the following tools:

  • gitolite for SSH access management;
  • git-daemon for anonymous git protocol access;
  • cgit for web interface.

Please stay tuned!

P.S: I’ve heard some concerns regarding cgit not working properly in some cases. I can neither confirm this nor disprove (since I rarely use web frontends for code browsing), but I’ll look into it.

0 comments »

Euterpe

Some time ago I decided I needed a tool that would synchronize my music with my player. I required several very simple things from it:

  • encode music automatically into a supported format,
  • save encoded files somewhere, in case it ever needs to be encoded for a second time,
  • the tracks to be transferred are selected in a non-brain-exploding manner.

I failed to find a tool that does exactly this. So I simply had to write my own! This ultimately led to a simple music-synchronization framework. Unfortunately I didn’t have the strength to make a decent product out of it, so I’ll simply show you what I’ve done.

If you want to use it and/or improve it, I’ll be very glad.

http://dev.hades.name/Euterpe

0 comments »

New Server

I’ve just moved this blag to a new server. Please excuse the smelling paint and unfinished staircases :)

0 comments »

Git Is Your Friend not a Foe Vol. 4: Rebasing

This time I’ll talk about more complex things, which give developers more power over their life. Actually, they just look complex. In fact these are quite natural operations over Git commit history structure, which was described already in my previous posts and gazillions of posts by other people.

So, let’s start with the simplest. Mr. Cleverhead sometimes fails to remember which branch he is currently on. He commits to a branch master, while in fact he should have committed that to staging. What can he do to fix it? He decides to run git show, save the commit as a patch, then checkout branch staging, apply patch there and commit with the same commit message. Well, it happens so that Git already has a command that does exactly this automatically! It is called git cherry-pick. Besides fixing Mr. Cleverhead’s reputation, it is also quite often used for example to backport commits to release branches.

Git cherry-pick

Note, however, that this still requires Mr. Cleverhead to remove his commit from master. We believe in him.

The next big complex thing is a branch-wise cherry-pick. This is called a rebase and often casts a great deal of confusion upon novice Git users. But actually rebase is to cherry-pick as is multiplication to addition: by doing rebase you just cherry-pick a series of commits on another branch. Although git-rebase manual page is ugly and unfriendly, it tells the same basic thing: rebase is a series of cherry-picks, followed by a branch reset.

Git rebase

Note the desaturated old commits. Despite Git changed the branch head, it didn’t remove these old commits. They are still accessible through reflog, or by their SHA1 ids, in case you realise you’ve made a mistake.

There are many use-cases for git rebase, from Mr. Cleverhead committing several commits in a row to a wrong branch, to complex integration and release cycle management workflows. See, for example, http://nvie.com/git-model.

There is also an interactive mode for git rebase. And it is truly awesome! It allows you to edit your branch in any way you want: remove commits, edit commits, squash commits, even change commit order. So try it out. Now.

The post would be incomplete if I didn’t mention git filter-branch. It is basically the same as rebase, but it usually affects the whole branch up to the initial commit and for every commit Git performs a certain action. It is very complex and powerful tool that is quite rarely needed and quite more rarely used. Its uses include: removing a file from the whole project history (for example because of license issues, or because Mr. Cleverhead accidentally added his grandmother’s recipe book five years ago); fixing author’s name in commit messages; creating a separate Git repository from a subdirectory. All this comes free with Git and doesn’t require you to spend weeks doing this any other clumsy way.

Last thing I would like to mention today is pull-rebasing. This refers to the following problem: by default git pull means git fetch followed by git merge, which is perfectly fine if you haven’t committed anything locally. But if you have, this creates a merge commit. This is perfectly fine either. But if you do small commits often, or work on a big feature in your branch and merge master often to be in sync with updates, this will create a so-called “loopy history”, which usually pisses people off. Especially Linus. So if you have committed a small fix that you can’t push because Mrs. Slowrunner managed to push a commit before you, use git pull --rebase. This will rebase your work upon Mrs. Slowrunner’s work and no merge commit will be created. If you work on a topic branch and would like to sync it with master, simply run git rebase master. This will reduce pissed off people count too.

Git loopy and normal history

Note, however, that rebase rewrites history! This means, that if you have published your work somewhere, you shouldn’t rebase it, unless you have warned people that they may expect history rewriting. This is usually refered to as throw-away branches (for example branch pu of git.git). If you have published your work, rewritten it and try to publish it again, git push won’t let you. If you force it to, expect angry people to come to your house with your local analogue of baseball bats.

Regarding angry people: some workflows require topic branches to be squashed to a single commit before merging to mainline. This is easy in Git: just use the --squash option of git merge, and everyone will be happy. If you want to squash only some of them (for example, make 5 commits out of 20), you can use aforementioned interactive rebase.

This pretty much concludes this series of Git posts. If you feel that I have left something unveiled, please tell me that! I appreciate all the comments.

Previous posts:

All posts about Git

3 comments »