[Perldl] Mysterious slow down from repeated inner calls

Clifford Sobchuk clifford.sobchuk at ericsson.com
Mon Feb 27 03:39:20 HST 2012


I don't know if this is a typo or not, but in the code for inner loop you have the following:
>      $sim = inner($kernel{$w1},$kernel{w2});
Where $kernel{w2} should be $kernel{$w2}.




CLIFF SOBCHUK
Core RF Engineering
Phone 613-667-1974   ecn: 8109-71974
mobile 403-819-9233
yahoo: sobchuk
www.ericsson.com 

"The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who is solely responsible for this email and its contents. All inquiries regarding this email should be addressed to Ericsson. The web site for Ericsson is www.ericsson.com."

This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.ericsson.com/email_disclaimer


-----Original Message-----
From: chm [mailto:devel.chm.01 at gmail.com] 
Sent: Monday, February 27, 2012 5:46 AM
To: Jim Magnuson
Cc: perldl
Subject: Re: [Perldl] Mysterious slow down from repeated inner calls

I don't know of any reason why inner() would slow down---have you tried using NYTProf or some such tool to track time in inner and number of calls to inner?  One oddity is that the first loop appears to skip all calls to inner which would be *very* fast.  Maybe something is going on with the loop structure?

--Chris

On 2/27/2012 2:50 AM, Jim Magnuson wrote:
> Hello,
>
> I have a set of about 30,000 words, and I am using string kernels as a 
> metric of word similarity. The goal is to see whether different 
> kernels are better at predicting how quickly human subjects are able 
> to process words. I have calculated the string kernels for each word. 
> So now I have a file with 30,000 lines. The first field in each line 
> is a word, and this is followed by a 676-element vector representing the kernel representation.
>
> Once I read this in, I need to step through and calculate the 
> similarity of each word to every other word using vector cosine, as 
> well as track the highest similarity value (excluding the word 
> itself), and the set of X-most similar items (there are reasons to 
> believe these are good predictors of human performance).
>
> Here's the problem: when I start running the code below, it is very fast.
> It takes 5 msecs to process the first word (that is, to do the 
> necessary 30,000 cosines), but by the time it reaches the 100th it is 
> taking 37 msecs, and by the 1,000th it is taking 398 msecs -- with 
> 29,000 to go, and constant slowing...
>
> Memory use by perl stays constant, and I cannot figure out what would 
> make the program slow down so much. I posted a query at Perl Monks and 
> I got advice about how to speed up each step (the first word used to 
> take 38 msecs), and they pointed out that it is indeed the call to 
> inner that is the culprit (replace it with a non-pdl calculation, and 
> the slowing goes away). They suggested I should look for advice from PDL experts.
>
> So if anyone can give me pointers as to what is slowing things down 
> and whether there is a way to avoid it, I would be most grateful. 
> Apologies in advance for any offensively inefficient/awkward use of PDL!
>
> Thanks!
>
> jim
> #!/usr/bin/perl -s
> use PDL;
> use Time::HiRes qw ( time ) ;
> $|=1;
> $top = 20;
>
> while(<>){
>      chomp;
>      ($wrd, @data) = split;
>      $kernel{$wrd} = norm(pdl(@data));
>      # EXAMPLE LINE
>      # word 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 
> 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> }
> $nrecs = keys %kernel;
> @kernelKeys = sort( keys %kernel );
>
> $startAll = time();
>
> $at1 = 0;
> foreach $w1 (@kernelKeys) {
>    $totalsim = $maxsim = 0;
>    $startWord = time();
>    @topX = ();
>    $at2 = 0;
>    foreach $w2 (@kernelKeys) {
>      next if($at1 == $at2); # skip identical item, but not homophones
>      $at2++;
>      $sim = inner($kernel{$w1},$kernel{w2});
>      $totalsim+=$sim;
>      if($sim>  $maxsim){      $maxsim = $sim;    }
>      # keep the top 20
>      if($#topX<  $top){
>        push @topX, $sim;
>      } else {
>        @topX = sort { $a<=>  $b } @topX;
>        if($sim>  $topX[0]){ $topX[0] = $sim;      }
>      }
>    }
>    $at1++;
>    $topXtotal = sum(pdl(@topX));
>    printf "$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\n";
>    unless($at1 % 10){
>      $now = time();
>      $elapsed = $now - $startAll;
>      $thisWord = $now - $startWord;
>      $perWord = $elapsed / $at1;
>      $hoursRemaining = (($nrecs - $at1) * $perWord)/3600;
>      printf STDERR "#$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\t";
>      printf STDERR "ELAPSED %.3f THISWORD %.3f PERWORD %.3f HOURStoGO 
> %.3f\n",
>        $elapsed, $thisWord, $perWord, $hoursRemaining;
>    }
> }
>
>
>
>
> _______________________________________________
> Perldl mailing list
> Perldl at jach.hawaii.edu
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


_______________________________________________
Perldl mailing list
Perldl at jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl




More information about the Perldl mailing list