Django: A Simple Keyword Search

Disclaimer:

This post was actually started about 9 months ago and never finished, so I’m not sure if the code will even run on the latest version of Django. I’m actually doing Ruby on Rails work nowadays at my job, and using Django solely for my personal projects. With that said, I hope some people out there will find some of these techniques somewhat useful.
—————————————————————-

Search

One feature that is commonly used on a lot of Django applications is the ability to search through content by keywords.  There are many different ways to do this, but I’d like to present a way that’s super easy to implement and uses only the Django model API.

Before I start, I’d like to first give a little disclaimer: this is absolutely NOT the most efficient way to perform a search through data.  If you’re looking for a more advanced solution, check out djangosearch on google code.

Ok, let’s say you’re making a blogging application, and you need a way to search blog entries for certain keywords. You may have a model named BlogEntry that looks like this:

class BlogEntry(models.Model):
    blog = models.ForeignKey(Blog)
    subject = models.CharField(max_length=100)
    body = models.TextField()

Now let’s say you needed a way to search the ‘body’ and ‘subject’ fields of all BlogEntries for the keywords ‘Django’ and ‘Python.’ Given a set of BlogEntries, this function will search their subject and body fields for the specified keywords. Here’s how you would build the function:

def search_keywords(blogentries, keywords):
    '''
    Searches a given QuerySet and returns a
    QuerySet that contains any word in the list of keywords
    '''
    if isinstance(keywords, str):
        keywords = [keywords]

    if not isinstance(keywords, list):
        return None

    list_body_qs = [Q(body__icontains=x) for x in keywords]
    list_subj_qs = [Q(subject__icontains=x) for x in keywords]
    final_q = reduce(operator.or_, list_body_qs + list_subj_qs)
    r_qs = blogentries.filter(final_q)
    return r_qs

The beginning of the function is pretty straightforward. The interesting part is the latter half of the function, which uses Django’s database API and Python’s reduce function to dynamically create a Django query with all of the keywords in it.

First, build a couple lists of Q objects, one for each keyword and field.

...
list_body_qs = [Q(body__icontains=x) for x in keywords]
list_subj_qs = [Q(subject__icontains=x) for x in keywords]
---

Using Python’s reduce function, we glue all these Q objects together with operator.or_.

...
final_q = reduce(operator.or_, list_body_qs + list_subj_qs)
r_qs = blogentries.filter(final_q)
return r_qs
---

Essentially, you create a query similar to this:
Q(body__icontains=”dog”) | Q(body__icontains=”cat”) | Q(body__icontains=”parrot”) …

If you wanted to perform an all-inclusive search (i.e. AND’ing the search terms together instead of OR’ing them), you’d use operator.and_ instead of operator.or_

Finish

And that’s it. Just a few lines of code and you’ve got your own search function. Leave a comment with your thoughts :) (Unless you’re a spam bot)

A Simple Django Truncate Filter

The Problem:

The built-in Django filter, truncate_words, truncates a string after a certain number of words.  This is great, but many times I find I have very tight space restrictions in certain areas of a page, and a string that is too long would push its way into another element and subsequently into my head in the form of a headache.

The built-in truncate_words filter is no help here — it does nothing to limit the width of a string.

I.e., “One two” and “ooooooooooooooooooooooooooonnnnnnnnnnnnnnneeeeeeeeeeeeee twoooooooooooooo” are both only 2 words, yet they have extremely different widths :)

The Solution:

We need a filter that truncates not only by words, but by characters too.  It’s an extremely simple filter, and often times I wonder why it’s not included in Django.

Let’s start from the very beginning.  Every filter must live in your app’s templatetags directory.  So create a file in that directory named “truncate_filters.py” or something.  If you need any more information than that, take a look at the Django documentation on how to create a custom filter.

Here is what the filter looks like:

from django import template
register = template.Library()

@register.filter("truncate_chars")
def truncate_chars(value, max_length):
    if len(value) <= max_length:
        return value

    truncd_val = value[:max_length]
    if value[max_length] != " ":
        rightmost_space = truncd_val.rfind(" ")
        if rightmost_space != -1:
            truncd_val = truncd_val[:rightmost_space]

    return truncd_val + "..."


*update* code was changed per chris and paul’s suggestions below, I haven’t tested but assume they work :)

Here’s how it works visually on this string: “This is a sample string”
This is what happens when the filter is supplied with the argument 20:

  1. Cut down the string to 20 chars if it is greater than 20 chars in length.  The string now becomes: “This is a sample str”
  2. Find the right-most space, indicating the start of the last word in the string, and truncate again:  The string is now: “This is a sample”
  3. Add “…” and return.  “This is a sample…”

You can invoke the filter from within your template like so:

{% load truncate_filters %}
<ol>
{% for some_string in a_list_of_strings %}
    <li>{{some_string|truncate_chars:50}}</li>
{% endfor %}
</ol>

Hope someone out there finds this helpful :)

The Day I Switched from Vista x64 to Ubuntu for Compatibility Reasons

Dell crap-j

Dell crap-j

I’ve always been a fan of Linux.  Ever since I was barely 13ish years old I remember splurging up $1.50 to order a Red Hat Linux cd from linuxmall.com.  With all that said, I still never thought I’d see this day:  the day my recently purchased mp3 player, a Dell DJ 20gb, would not work on Windows but worked perfectly fine on Linux.  Here’s the story.

I recently came across a great deal on a Dell DJ.  Sure, I already own a few mp3 players, but a 20gb usb powered external hard drive which also happens to play mp3s all for 35 dollars seemed like a pretty good deal.  So I bought it.  When it arrived, I ripped it out of its box and hooked it up to one of my desktops, which runs Vista x64.  Up popped one of those infamous “Search for drivers” dialogs.  I gave it a shot.  It failed, of course.

But wait! All I really wanted to do was transfer music and files to it.  Maybe it was designed in such a way that I could just copy mp3s over to it like an external hard drive (after all, the Dell DJ is actually made by Creative which produces the Muvo).  Well, needless to say, that didn’t pan out.  I could transfer files to it, but there was no “Music” folder in sight.  I saw that Dell took the same approach as Apple — Dell Dj Explorer being the equivalent to Apple Itunes.  Pure crap.

Popping in the included drivers disc didn’t prove to be any help either.  The drivers would not install.  I turned to the internet for help, but was immediately discouraged after finding that the last time Dell updated the drivers for this device was in 2005.  Wow.

Then the unthinkable happened.  The Dell DJ entry on Wikipedia noted that there were a couple Linux programs that could do the job: gnomad2 and neutrino.  Sure enough, I pulled out my laptop, which runs Ubuntu: Hardy Heron, and plugged in the Dell crap-jay.  After a quick “sudo apt-get install gnomad2″, I was happily transferring my small music collection over to the device.  Awesome.

I’m not sure if my situation was a special case or not.  A popular device (as popular as can be in the IPod saturated mp3 player market) compatible with Linux and not Windows?  Who woulda thunk it? Not me.

The Dark Side of Code Commenting

Ever since the beginning of my journey to become a solid programmer, it was taught to me that commenting code was always a good thing.  You can never have enough comments, they would say.  The golden rule was to have at least an equal comments to code line ratio.

All of this made sense to me back in college.  Nothing wrong with being too descriptive right?  Wrong! Sure, comments can be a great thing.  There’s nothing better than reading a comment that helps you understand some complex piece of code.  However, after working in the industry for awhile, I’ve seen the dark side of comments.

The other day, while working on one of many bugfixes, I came across an obfuscated piece of code.  Instead of diving into it and figuring out where each function/variable/whatever came from, I trusted a couple lines of comments above the code.  Thank god for comments, right!  Wrong again!  Apparently, whoever wrote that code wrote it over someone else’s code, but hadn’t updated the comment because he hadn’t noticed it.  So the comment was downright false.  A lie.  Imagine dealing with that all day.  Ever try to understand thousands of lines of code where some comments are leading you down the wrong path, like satan?

My thinking is, keep the comments to a minimum.  Instead of commenting excessively, try to write your code in such a way that it doesn’t *need* too many comments to understand.  Good code speaks for itself :)

And don’t be lazy, always update the comments as needed as your code changes.  If you come across a useless one, throw it out… like a leper!

Because I’ve been getting some weird feedback on this post (my fault most likely, for being a professional programmer and not a writer), let me reiterate my point here…comments, when used appropriately, are a good thing! The key word is appropriately.  Don’t diarrhea all over your code.  It’s stinky.  And too many people do it.  When a developer sees a piece of code, he tries to understand.  When he reads a comment, he is inclined to believe it right off the bat.  So don’t mess with his head.  Get it?

Oh and hey, leave some “comments” below if you’ve had bad experiences too :)

Django: A Profanity Filter

The reason for it:

There are often times when you would like to display content on your page that was actually submitted by another user, such as displaying a list of recent posts on your homepage or something.  The problem is that you don’t want to post any offensive material on such a prominent page.  Without real live human moderation, the best we can do is strip out things we know are offensive (to most people anyway), such as bad words.  Here’s a profanity filter for Django I wrote using code mostly sheisted from django.core.validators.

And here she is:

Here’s what the filter looks like: (If you don’t know how to make a filter in Django, read the documentation)

@register.filter("replace_bad_words")
def replace_bad_words(value):
    """ Replaces profanities in strings with safe words
    For instance, "shit" becomes "s--t"
    """
    words_seen = [w for w in settings.PROFANITIES_LIST if w in value]
    if words_seen:
        for word in words_seen:
            value = value.replace(word, "%s%s%s" % (word[0], '-'*(len(word)-2), word[-1]))
    return value

Some other things:

Just throw that on a django template variable and it will replace words like “shit” with “s–t.” It won’t change words like “ass” and “dick” since they are technically not bad words… but if you think they are, you can do something like this:

...
extra_bad_words = ['ass', 'dick']

bad_words = settings.PROFANITIES_LIST.extend(extra_bad_words)

words_seen = [w for w in bad_words if w in value]
...

Pretty useful huh?

Oh, one more thing to add — this filter depends on the profanities list that is included in Django.  To get this, make sure you import settings:


from django.conf import settings

That’s it.

I know there are a lot of details missing.  I apologize.  If you make a comment here, I’ll be happy to help you out.