[Perldl] Mysterious slow down from repeated inner calls

Jim Magnuson james.magnuson at uconn.edu
Mon Feb 27 03:46:10 HST 2012


Yes, a typo -- I changed the variable name for the example code and lost
the $ somehow...

Currently trying to get timing tests going as suggested by chm...

thanks,

jim

On Mon, Feb 27, 2012 at 2:39 PM, Clifford Sobchuk <
clifford.sobchuk at ericsson.com> wrote:

> I don't know if this is a typo or not, but in the code for inner loop you
> have the following:
> >      $sim = inner($kernel{$w1},$kernel{w2});
> Where $kernel{w2} should be $kernel{$w2}.
>
>
>
>
> CLIFF SOBCHUK
> Core RF Engineering
> Phone 613-667-1974   ecn: 8109-71974
> mobile 403-819-9233
> yahoo: sobchuk
> www.ericsson.com
>
> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who
> is solely responsible for this email and its contents. All inquiries
> regarding this email should be addressed to Ericsson. The web site for
> Ericsson is www.ericsson.com."
>
> This Communication is Confidential. We only send and receive email on the
> basis of the terms set out at www.ericsson.com/email_disclaimer
>
>
> -----Original Message-----
> From: chm [mailto:devel.chm.01 at gmail.com]
> Sent: Monday, February 27, 2012 5:46 AM
> To: Jim Magnuson
> Cc: perldl
> Subject: Re: [Perldl] Mysterious slow down from repeated inner calls
>
> I don't know of any reason why inner() would slow down---have you tried
> using NYTProf or some such tool to track time in inner and number of calls
> to inner?  One oddity is that the first loop appears to skip all calls to
> inner which would be *very* fast.  Maybe something is going on with the
> loop structure?
>
> --Chris
>
> On 2/27/2012 2:50 AM, Jim Magnuson wrote:
> > Hello,
> >
> > I have a set of about 30,000 words, and I am using string kernels as a
> > metric of word similarity. The goal is to see whether different
> > kernels are better at predicting how quickly human subjects are able
> > to process words. I have calculated the string kernels for each word.
> > So now I have a file with 30,000 lines. The first field in each line
> > is a word, and this is followed by a 676-element vector representing the
> kernel representation.
> >
> > Once I read this in, I need to step through and calculate the
> > similarity of each word to every other word using vector cosine, as
> > well as track the highest similarity value (excluding the word
> > itself), and the set of X-most similar items (there are reasons to
> > believe these are good predictors of human performance).
> >
> > Here's the problem: when I start running the code below, it is very fast.
> > It takes 5 msecs to process the first word (that is, to do the
> > necessary 30,000 cosines), but by the time it reaches the 100th it is
> > taking 37 msecs, and by the 1,000th it is taking 398 msecs -- with
> > 29,000 to go, and constant slowing...
> >
> > Memory use by perl stays constant, and I cannot figure out what would
> > make the program slow down so much. I posted a query at Perl Monks and
> > I got advice about how to speed up each step (the first word used to
> > take 38 msecs), and they pointed out that it is indeed the call to
> > inner that is the culprit (replace it with a non-pdl calculation, and
> > the slowing goes away). They suggested I should look for advice from PDL
> experts.
> >
> > So if anyone can give me pointers as to what is slowing things down
> > and whether there is a way to avoid it, I would be most grateful.
> > Apologies in advance for any offensively inefficient/awkward use of PDL!
> >
> > Thanks!
> >
> > jim
> > #!/usr/bin/perl -s
> > use PDL;
> > use Time::HiRes qw ( time ) ;
> > $|=1;
> > $top = 20;
> >
> > while(<>){
> >      chomp;
> >      ($wrd, @data) = split;
> >      $kernel{$wrd} = norm(pdl(@data));
> >      # EXAMPLE LINE
> >      # word 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0
> > 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> >
> > }
> > $nrecs = keys %kernel;
> > @kernelKeys = sort( keys %kernel );
> >
> > $startAll = time();
> >
> > $at1 = 0;
> > foreach $w1 (@kernelKeys) {
> >    $totalsim = $maxsim = 0;
> >    $startWord = time();
> >    @topX = ();
> >    $at2 = 0;
> >    foreach $w2 (@kernelKeys) {
> >      next if($at1 == $at2); # skip identical item, but not homophones
> >      $at2++;
> >      $sim = inner($kernel{$w1},$kernel{w2});
> >      $totalsim+=$sim;
> >      if($sim>  $maxsim){      $maxsim = $sim;    }
> >      # keep the top 20
> >      if($#topX<  $top){
> >        push @topX, $sim;
> >      } else {
> >        @topX = sort { $a<=>  $b } @topX;
> >        if($sim>  $topX[0]){ $topX[0] = $sim;      }
> >      }
> >    }
> >    $at1++;
> >    $topXtotal = sum(pdl(@topX));
> >    printf "$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\n";
> >    unless($at1 % 10){
> >      $now = time();
> >      $elapsed = $now - $startAll;
> >      $thisWord = $now - $startWord;
> >      $perWord = $elapsed / $at1;
> >      $hoursRemaining = (($nrecs - $at1) * $perWord)/3600;
> >      printf STDERR "#$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\t";
> >      printf STDERR "ELAPSED %.3f THISWORD %.3f PERWORD %.3f HOURStoGO
> > %.3f\n",
> >        $elapsed, $thisWord, $perWord, $hoursRemaining;
> >    }
> > }
> >
> >
> >
> >
> > _______________________________________________
> > Perldl mailing list
> > Perldl at jach.hawaii.edu
> > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>
> _______________________________________________
> Perldl mailing list
> Perldl at jach.hawaii.edu
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.jach.hawaii.edu/pipermail/perldl/attachments/20120227/5d1affb7/attachment.html>


More information about the Perldl mailing list