[Perldl] Mysterious slow down from repeated inner calls

chm devel.chm.01 at gmail.com
Mon Feb 27 02:46:07 HST 2012


I don't know of any reason why inner() would slow
down---have you tried using NYTProf or some such
tool to track time in inner and number of calls to
inner?  One oddity is that the first loop appears
to skip all calls to inner which would be *very*
fast.  Maybe something is going on with the loop
structure?

--Chris

On 2/27/2012 2:50 AM, Jim Magnuson wrote:
> Hello,
>
> I have a set of about 30,000 words, and I am using string kernels as a
> metric of word similarity. The goal is to see whether different kernels are
> better at predicting how quickly human subjects are able to process
> words. I have calculated the string kernels for each word. So now I have a
> file with 30,000 lines. The first field in each line is a word, and this is
> followed by a 676-element vector representing the kernel representation.
>
> Once I read this in, I need to step through and calculate the similarity of
> each word to every other word using vector cosine, as well as track the
> highest similarity value (excluding the word itself), and the set of X-most
> similar items (there are reasons to believe these are good predictors of
> human performance).
>
> Here's the problem: when I start running the code below, it is very fast.
> It takes 5 msecs to process the first word (that is, to do the necessary
> 30,000 cosines), but by the time it reaches the 100th it is taking 37
> msecs, and by the 1,000th it is taking 398 msecs -- with 29,000 to go, and
> constant slowing...
>
> Memory use by perl stays constant, and I cannot figure out what would make
> the program slow down so much. I posted a query at Perl Monks and I got
> advice about how to speed up each step (the first word used to take 38
> msecs), and they pointed out that it is indeed the call to inner that is
> the culprit (replace it with a non-pdl calculation, and the slowing goes
> away). They suggested I should look for advice from PDL experts.
>
> So if anyone can give me pointers as to what is slowing things down and
> whether there is a way to avoid it, I would be most grateful. Apologies in
> advance for any offensively inefficient/awkward use of PDL!
>
> Thanks!
>
> jim
> #!/usr/bin/perl -s
> use PDL;
> use Time::HiRes qw ( time ) ;
> $|=1;
> $top = 20;
>
> while(<>){
>      chomp;
>      ($wrd, @data) = split;
>      $kernel{$wrd} = norm(pdl(@data));
>      # EXAMPLE LINE
>      # word 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> }
> $nrecs = keys %kernel;
> @kernelKeys = sort( keys %kernel );
>
> $startAll = time();
>
> $at1 = 0;
> foreach $w1 (@kernelKeys) {
>    $totalsim = $maxsim = 0;
>    $startWord = time();
>    @topX = ();
>    $at2 = 0;
>    foreach $w2 (@kernelKeys) {
>      next if($at1 == $at2); # skip identical item, but not homophones
>      $at2++;
>      $sim = inner($kernel{$w1},$kernel{w2});
>      $totalsim+=$sim;
>      if($sim>  $maxsim){      $maxsim = $sim;    }
>      # keep the top 20
>      if($#topX<  $top){
>        push @topX, $sim;
>      } else {
>        @topX = sort { $a<=>  $b } @topX;
>        if($sim>  $topX[0]){ $topX[0] = $sim;      }
>      }
>    }
>    $at1++;
>    $topXtotal = sum(pdl(@topX));
>    printf "$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\n";
>    unless($at1 % 10){
>      $now = time();
>      $elapsed = $now - $startAll;
>      $thisWord = $now - $startWord;
>      $perWord = $elapsed / $at1;
>      $hoursRemaining = (($nrecs - $at1) * $perWord)/3600;
>      printf STDERR "#$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\t";
>      printf STDERR "ELAPSED %.3f THISWORD %.3f PERWORD %.3f HOURStoGO
> %.3f\n",
>        $elapsed, $thisWord, $perWord, $hoursRemaining;
>    }
> }
>
>
>
>
> _______________________________________________
> Perldl mailing list
> Perldl at jach.hawaii.edu
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl





More information about the Perldl mailing list