programming task!

masud · Post by **masud** » Sun Mar 25, 2007 11:39 am

well you give us the free hand to write code in any language of our own choice. So I prefer coding in C (part of job) other then C++ or any scripting language. I didn't even give it a second thought to look for any other lang.
And the second thing which counts is the environment in which you are going to implement this task. Of course if I'm sys admin and have to do such task, I definitely would choose perl.

kragen · Post by **kragen** » Sun Mar 25, 2007 2:31 pm

Here's my Perl solution:

Code: Select all

#!/usr/bin/perl -w
use strict;

my %alias;
# recursively expand an alias, returning a list of email addresses
sub _expand_alias {
  my ($alias) = @_;
  return $alias unless exists $alias{$alias};
  return map { expand_alias($_) } @{$alias{$alias}};
}
# recursively expand an alias, catching cycles
my %expanding;
sub expand_alias {
  my ($alias) = @_;
  die "cycle on $alias" if $expanding{$alias};
  $expanding{$alias} = 1;
  my @rv = _expand_alias $alias;
  delete $expanding{$alias};
  return @rv;
}
# return a (possibly reordered) list of the unique elements passed as arguments
sub uniq {
  my (@things) = @_;
  my %rv;
  @rv{@things} = ();
  return keys %rv;
}

# main program: expand all aliases in the input file, which is in the form
# admin1: zahra@server5, mani
while (<>) {
  chomp;
  my ($name, $values) = split /:\s*/, $_, 2;
  $alias{$name} = [split /,\s*/, $values];
}
for my $alias (keys %alias) {
  print "$alias: ", join(', ', uniq expand_alias($alias)), "\n";
}

I note that this is considerably shorter than the other Perl solution posted, but it appears that the other Perl solution includes a general library for handling relations, as in the relational algebra, and the part that's specific to this problem is only a few lines at the bottom --- considerably shorter than what I've written.

Here's my C version:

Code: Select all

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>

/*
 * from http://www.linuxpakistan.net/forum2x/viewtopic.php?p=34491 :
 * here's a task for you. i had to write a program for someone to do this, and
 * thought i'd see how you'd solve it.
 *
 * you have a file in this format:
 * 
 * root: admin1, admin2
 * postmaster: root
 * news: admin
 * admin: mani
 * admin1: zahra@server5, mani
 * admin2: admin1, manager@server5
 * mani: mani@server5
 *
 * you want to flatten the aliases, and output a file like this:
 * 
 * root: zahra@server5, mani@server5, manager@server5
 * postmaster: zahra@server5, mani@server5, manager@server5
 * news: mani@server5
 * admin: mani@server5
 * admin1: zahra@server5, mani@server5
 * admin2: zahra@server5, mani@server5, manager@server5
 * mani: mani@server5
 *
 * This solution is in the public domain.
 */

/* My general strategy: read in the entire file as a string; use the string as
 * the dictionary data structure; use a linear search on a linked list to catch
 * duplicates and loops. */

#define MYBUF 1024*1024
char buf[MYBUF];

/* advance a pointer to the next line or end of string */
char *next_line(char *pos) {
        while (*pos && *pos != '\n') pos++;
        if (*pos) pos++;
        return pos;
}

/* is a pointer looking at a certain substring followed by a :? */
int looking_at_definition(char *pos, char *alias, int len) {
        for (;;pos++, alias++, len--) {
                if (!len) return *pos == ':';
                if (!*pos || *pos != *alias) return 0;
        }
}

/* return a pointer to the line defining a particular alias */
/* (end of string on failure) */
char *find_definition(char *alias, int len) {
        char *rv = buf;
        while (*rv && !looking_at_definition(rv, alias, len))
                rv = next_line(rv);
        return rv;
}

/* a stack of strings to test membership in, to avoid cycles and duplicates */
typedef struct _strstack { char *s; int len; struct _strstack *next; } strstack;
strstack *push(strstack *stack, char *s, int len) {
        strstack *rv = malloc(sizeof(*rv));
        rv->s = s;
        rv->len = len;
        rv->next = stack;
        return rv;
}
strstack *pop(strstack *stack) {
        strstack *rv = stack->next;
        free(stack);
        return rv;
}
int contains(strstack *stack, char *s, int len) {
        while (stack) {
                if (len == stack->len && !memcmp(stack->s, s, len)) return 1;
                stack = stack->next;
        }
        return 0;
}

void output(char *s, int len) { fwrite(s, len, 1, stdout); }

/* a sane strchr, returns end of string if not found */
char *find(char *s, int c) {
        char *rv = strchr(s, c);
        if (rv) return rv;
        return s + strlen(s);
}

char *find_addresses(char *s) {
        while (*s && (*s == ':' || isspace(*s))) s++;
        return s;
}
#define min(x, y) (((x)<(y)) ? (x) : (y))
char *find_address_end(char *address) {
        char *comma = find(address, ',');
        char *nl = find(address, '\n');
        return min(comma, nl);
}
char *next_address(char *address) {
        char *end = find_address_end(address);
        while (*end && *end != '\n' && (*end == ',' || isspace(*end))) end++;
        if (!*end) return NULL;
        if (*end == '\n') return NULL;
        return end;
}

/* ensure the addresses on alias_line are in address_list, adding if needed */
/* returns updated address_list */
strstack *expand_aliases(char *alias, strstack *expanding, 
                strstack *address_list);
/* ensure the addresses that 'address' means are in address_list, 
 * adding if needed */
strstack *find_expansion(char *address, strstack *expanding, 
                strstack *address_list) {
        char *end = find_address_end(address);
        char *definition = find_definition(address, end-address);
        if (*definition) {
                address_list = expand_aliases(definition, expanding, address_list);
        } else {
                if (!contains(address_list, address, end-address))
                        address_list = push(address_list, address, end-address);
        }
        return address_list;
}

/* see comment above forward declaration */
strstack *expand_aliases(char *alias_line, strstack *expanding, 
                strstack *address_list) {
        char *address;
        char *colon = find(alias_line, ':');
        int len = colon - alias_line;
        if (contains(expanding, alias_line, len)) {
                fprintf(stderr, "loop on ");
                fwrite(alias_line, len, 1, stderr);
                fprintf(stderr, "\n");
                exit(-1);
        }
        expanding = push(expanding, alias_line, len);
        address = find_addresses(colon);
        while (address) {
                address_list = find_expansion(address, expanding, address_list);
                address = next_address(address);
        }
        pop(expanding);
        return address_list;
}

void output_address_list(strstack *s) {
        if (!s) return;
        output(s->s, s->len);
        s = pop(s);
        while (s) {
                printf(", ");
                output(s->s, s->len);
                s = pop(s);
        }
}

void output_aliases(char *line) {
        char *colon = find(line, ':');
        strstack *address_list = expand_aliases(line, NULL, NULL);
        output(line, colon-line+1); /* +1 to include : */
        printf(" ");
        output_address_list(address_list);
        printf("\n");
}

/* read a file in the specified format from stdin; results go to stdout. */
int main(int argc, char **argv) {
        /* XXX note that this strategy for reading the whole file only works
         * for disk files */
        int len = read(0, buf, MYBUF);
        char *line = buf;
        if (len == -1) { perror("read"); return 1; }
        if (len == MYBUF) {
                fprintf(stderr, "File too big; change MYBUF\n");
                return 1; 
        }
        buf[len] = '\0';
        while (*line) {
                output_aliases(line);
                line = next_line(line);
        }
        return 0;
}

You will also note that this C version, at 193 lines (134 lines of real code) is a bit shorter than the other C version, but is still four or five times longer than the Perl version. This means that it probably has four or five times as many bugs, it probably took me four or five times as long to write, and it will take someone else at least four or five times as long to understand it well enough to add features to it. (Say, skipping comment and blank lines, like in Faried's Python version, or not taking O(N^2) time on long files, a problem the Perl version doesn't have.)

I agree with masud's comment that the environment you're working in is important, but I think it's actually the primary, most important thing to consider; if you need to run a program on 5000 machines running various old versions of Windows, or if you need to embed it in a microcontroller or a WRT54G wireless router, or if you need to run at speeds Perl can't touch, or if the guy coming after you knows C but not Perl, Perl may not even be an option.

(This particular C solution is much slower than the Perl version for large files, which is when it matters, and suffers from an arbitrary file-size limit, but these are fixable in the C version with probably less than another 20 lines of code, while the corresponding problems aren't fixable in Perl.)

But if Perl is an option, clearly in this case it is a far superior option, just by virtue of the overwhelming ratio in code volume. My C version took more time to write, probably contains more bugs, will take more time to add features to, will tend to accumulate more bugs for each feature addition, than my Perl version.

kragen · Post by **kragen** » Sun Mar 25, 2007 3:22 pm

I read through masud's C solution. I noticed that it looked like it would crash if you gave it a file containing just

Code: Select all

foo:

I downloaded and compiled it; sure enough, it does. That's because there's a do-while loop instead of a while-loop on the data-node pointers, which dereferences a null pointer if the list is zero elements long:

Code: Select all

                data_ptr = head_ptr->data_node;
                do {
                        if (hunt (head, head_ptr, data_ptr) == 1 ) goto out;
                } while ((data_ptr = data_ptr->next));

One of the first thing hunt() does is to get ->data from its third argument:

Code: Select all

int hunt (struct head_node *head, struct head_node *cur_head, struct data_node *node)
{
...
        while (head_ptr) {
                if(strcmp(head_ptr->name,node->data) == 0) {

Generally do-while loops are a bad idea, since it's an unusual case where you really want to handle the zero-element case (or whatever the while-condition-never-true case means) with the same number of iterations as the one-element case (or while-condition-true-exactly-once). There are occasional exceptions. In this case, I think

Code: Select all

for (data_ptr = head_ptr->data_node; data_ptr; data_ptr = data_ptr->next)

would have been both clearer and correct.

Additionally, it produces incorrect results and may crash if any of the names are over 255 characters long, or any of the lines are over 1023 characters long. I've abbreviated the following so they aren't 256 chars wide; given this input:

Code: Select all

(256 letter Bs): (256 letter As), (256 letter Cs)

it produces this output on my machine:

Code: Select all

(256 letter Bs): (256 letter As)pa    , (256 letter Cs)

And given this input, it crashes:

Code: Select all

(256 letter Bs): (256 letter As), (256 letter Cs), (256 letter Ds)

Here "(4 letter As)" is supposed to mean "AAAA".

None of these bugs are present in either of the Perl implementations or in Faried's Python. My own C probably has different bugs. (The only one I've noticed so far is that it will crash if it runs out of memory, but it's always harder to find your own bugs than someone else's.)

masud's C version, like mine, is O(N^2) in the size of input files, so speed is probably not a valid reason for preferring either C implementation over the Perl or Python ones, which will be much faster than either C version when the files are larger.

So this is why it's worthwhile to give the choice of language a second thought.

kragen · Post by **kragen** » Sun Mar 25, 2007 7:51 pm

Code: Select all

#!/usr/bin/perl
use strict;
use warnings;
 
my %alias;
 
# recursively expand an alias, returning a list of email addresses
sub expand_alias {
    my ( @expanded ) = @_;
    my ( $i, %seen ) = ( -1, map { $_ => 1 } @expanded );
    while( ++$i < @expanded ) {
        push @expanded, grep !$seen{$_}++, @{ $alias{ $expanded[ $i ] } || [] };
    }
    return grep !$alias{ $_ }, @expanded;
}
 
# main program: expand all aliases in the input file, which is in the form
# admin1: zahra@server5, mani
while (<>) {
    chomp;
    my ( $name, $values ) = split /:\s*/, $_, 2;
    $alias{ $name } = [ split /,\s*/, $values ];
}
 
for my $alias ( keys %alias ) {
    print "$alias: ", join( ', ', expand_alias $alias ), "\n";
}

Handles loops rather than reporting them. A bit tricksy for my taste...

masud · Post by **masud** » Sun Mar 25, 2007 7:55 pm

Yes, these are critical programming mistakes. Thanks for you nice comments... This is true it takes much more time coding in C as compare to perl for a specific task. But In C we have more freedom to implement as many algorithms as we can to overcome any limitation, such as big file.

Post by **LinuxFreaK** » Mon Mar 26, 2007 12:07 pm

Dear kragen,
Salam,

Your C Code does not work.

Code: Select all

root@mbox [/home/farrukh]# ./alias1 alias.txt



root@mbox [/home/farrukh]#

Best Regards.

Post by **lambda** » Tue Mar 27, 2007 12:10 am

it reads from stdin. run it as ./alias1 < aliases

kragen · Post by **kragen** » Tue Mar 27, 2007 5:35 am

Oops, sorry, I should have documented that --- and probably given an error message if you passed it an argument.

linuxpakistan.net

programming task!

my solutions in Perl and C

masud's C solution

Aristotle Pagaltzis's version

Re: