[Perldl] Mysterious slow down from repeated inner calls

Chris Marshall devel.chm.01 at gmail.com
Mon Feb 27 05:04:30 HST 2012


To get the best performance, you'll need to use what we
call vectorized PDL operations.  Here is an example of
pdl-iomatic way to do some of your computation:

# use rcols to read data directly into a pdl and perl array
($inword, $grid) = rcols 'jmtest.data',0,[], { perlcols=>[0] };

# rearrange dimensions since rcols puts columns in dim(0)
$kern = $grid->mv(1,0)->norm;

# number of records is now length of dim(1)
$nrecs = $kern->dim(1);

# calculate all inner products at the same time
$sim = inner($kern(,(0)),$kern);

# calculate the top 20 values
$topXind = zeros(long,20);

# don't forget to skip diagonal elements
$sim->(1:-1)->maximum_n_ind($topXind);

# use slicing to get the max elements
$topX = $sim($topXind);

# how much wt in top 20?
print $topX->sum . "\n";


Cheers,
Chris

On Mon, Feb 27, 2012 at 8:46 AM, Jim Magnuson <james.magnuson at uconn.edu> wrote:
> Yes, a typo -- I changed the variable name for the example code and lost the
> $ somehow...
>
> Currently trying to get timing tests going as suggested by chm...
>
> thanks,
>
> jim
>
>
> On Mon, Feb 27, 2012 at 2:39 PM, Clifford Sobchuk
> <clifford.sobchuk at ericsson.com> wrote:
>>
>> I don't know if this is a typo or not, but in the code for inner loop you
>> have the following:
>> >      $sim = inner($kernel{$w1},$kernel{w2});
>> Where $kernel{w2} should be $kernel{$w2}.
>>
>>
>>
>>
>> CLIFF SOBCHUK
>> Core RF Engineering
>> Phone 613-667-1974   ecn: 8109-71974
>> mobile 403-819-9233
>> yahoo: sobchuk
>> www.ericsson.com
>>
>> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who
>> is solely responsible for this email and its contents. All inquiries
>> regarding this email should be addressed to Ericsson. The web site for
>> Ericsson is www.ericsson.com."
>>
>> This Communication is Confidential. We only send and receive email on the
>> basis of the terms set out at www.ericsson.com/email_disclaimer
>>
>>
>> -----Original Message-----
>> From: chm [mailto:devel.chm.01 at gmail.com]
>> Sent: Monday, February 27, 2012 5:46 AM
>> To: Jim Magnuson
>> Cc: perldl
>> Subject: Re: [Perldl] Mysterious slow down from repeated inner calls
>>
>> I don't know of any reason why inner() would slow down---have you tried
>> using NYTProf or some such tool to track time in inner and number of calls
>> to inner?  One oddity is that the first loop appears to skip all calls to
>> inner which would be *very* fast.  Maybe something is going on with the loop
>> structure?
>>
>> --Chris
>>
>> On 2/27/2012 2:50 AM, Jim Magnuson wrote:
>> > Hello,
>> >
>> > I have a set of about 30,000 words, and I am using string kernels as a
>> > metric of word similarity. The goal is to see whether different
>> > kernels are better at predicting how quickly human subjects are able
>> > to process words. I have calculated the string kernels for each word.
>> > So now I have a file with 30,000 lines. The first field in each line
>> > is a word, and this is followed by a 676-element vector representing the
>> > kernel representation.
>> >
>> > Once I read this in, I need to step through and calculate the
>> > similarity of each word to every other word using vector cosine, as
>> > well as track the highest similarity value (excluding the word
>> > itself), and the set of X-most similar items (there are reasons to
>> > believe these are good predictors of human performance).
>> >
>> > Here's the problem: when I start running the code below, it is very
>> > fast.
>> > It takes 5 msecs to process the first word (that is, to do the
>> > necessary 30,000 cosines), but by the time it reaches the 100th it is
>> > taking 37 msecs, and by the 1,000th it is taking 398 msecs -- with
>> > 29,000 to go, and constant slowing...
>> >
>> > Memory use by perl stays constant, and I cannot figure out what would
>> > make the program slow down so much. I posted a query at Perl Monks and
>> > I got advice about how to speed up each step (the first word used to
>> > take 38 msecs), and they pointed out that it is indeed the call to
>> > inner that is the culprit (replace it with a non-pdl calculation, and
>> > the slowing goes away). They suggested I should look for advice from PDL
>> > experts.
>> >
>> > So if anyone can give me pointers as to what is slowing things down
>> > and whether there is a way to avoid it, I would be most grateful.
>> > Apologies in advance for any offensively inefficient/awkward use of PDL!
>> >
>> > Thanks!
>> >
>> > jim
>> > #!/usr/bin/perl -s
>> > use PDL;
>> > use Time::HiRes qw ( time ) ;
>> > $|=1;
>> > $top = 20;
>> >
>> > while(<>){
>> >      chomp;
>> >      ($wrd, @data) = split;
>> >      $kernel{$wrd} = norm(pdl(@data));
>> >      # EXAMPLE LINE
>> >      # word 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0
>> > 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >
>> > }
>> > $nrecs = keys %kernel;
>> > @kernelKeys = sort( keys %kernel );
>> >
>> > $startAll = time();
>> >
>> > $at1 = 0;
>> > foreach $w1 (@kernelKeys) {
>> >    $totalsim = $maxsim = 0;
>> >    $startWord = time();
>> >    @topX = ();
>> >    $at2 = 0;
>> >    foreach $w2 (@kernelKeys) {
>> >      next if($at1 == $at2); # skip identical item, but not homophones
>> >      $at2++;
>> >      $sim = inner($kernel{$w1},$kernel{w2});
>> >      $totalsim+=$sim;
>> >      if($sim>  $maxsim){      $maxsim = $sim;    }
>> >      # keep the top 20
>> >      if($#topX<  $top){
>> >        push @topX, $sim;
>> >      } else {
>> >        @topX = sort { $a<=>  $b } @topX;
>> >        if($sim>  $topX[0]){ $topX[0] = $sim;      }
>> >      }
>> >    }
>> >    $at1++;
>> >    $topXtotal = sum(pdl(@topX));
>> >    printf "$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\n";
>> >    unless($at1 % 10){
>> >      $now = time();
>> >      $elapsed = $now - $startAll;
>> >      $thisWord = $now - $startWord;
>> >      $perWord = $elapsed / $at1;
>> >      $hoursRemaining = (($nrecs - $at1) * $perWord)/3600;
>> >      printf STDERR "#$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\t";
>> >      printf STDERR "ELAPSED %.3f THISWORD %.3f PERWORD %.3f HOURStoGO
>> > %.3f\n",
>> >        $elapsed, $thisWord, $perWord, $hoursRemaining;
>> >    }
>> > }
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Perldl mailing list
>> > Perldl at jach.hawaii.edu
>> > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>
>>
>> _______________________________________________
>> Perldl mailing list
>> Perldl at jach.hawaii.edu
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>




More information about the Perldl mailing list