[Perldl] How to find out cause of out of memory

MARK BAKER mrbaker_mark at yahoo.com
Tue Feb 14 11:36:33 HST 2012


Hey Clifford 
I was talking with David about along the same lines that 

your problem involves here, and I would like to share something with you 
that I think might make its way into the PDL core in one way or another

I found a way to load 3 gigabytes of data in only 40 megabytes of ram 

like this ...
##################################################

{open(DATA, "large_file_path_here");}  
  {my @offset;}   
  {$offset[1] = tell DATA;}   
 {my $line_num = 2;}   
 {while (<DATA>) {  
 {$offset[$line_num++] = tell DATA;}  
   }}    
print "DONE please enter a line number";
    { while (my $entered = <STDIN>) {    
     {  seek DATA, $offset[$entered], 0;             
      $line = <DATA>;  
    print $line,"\n";                         
    }
     }} 

###################################################

try this out with a large file 

enter a line number and it will bring up that 

the information on that line very fast..

the trick here is to put your file data into decimal 

then to pack the file data like this 

##################################################

$data = pack "w*", $_;
#################################################
then just use unpack to view the numerical data 

what this does is instead of calling each line, is it calls 

by each paragraph of numerical data which will allow you 

to save a lot of RAM by calling in the information in chunks
instead of by line 


hope that helps 


-Mark 






________________________________
 From: Clifford Sobchuk <clifford.sobchuk at ericsson.com>
To: Chris Marshall <devel.chm.01 at gmail.com> 
Cc: "Perldl at jach.hawaii.edu" <perldl at jach.hawaii.edu> 
Sent: Tuesday, February 14, 2012 12:36 PM
Subject: Re: [Perldl] How to find out cause of out of memory
 
Thanks all. Pre-allocating isn't obvious (to me) as the file and hence data are highly variable with no easy way to determine the size. 
I do think that it is the conversion from perl array to pdl as I am guessing that the entire perl array would have to be loaded - which likely causes the out of memory. In the "Whirlwind Tour" book there was an example showing how to assign an image to a hash with the elements being pdls. 

I am unsure how to do this with rgrep or rcols. I have tried:

open ($in, "<$ARGV[0]") or die "can't open ARGV[0] $!\n";
$fwdGain = rgrep {/\s\d\s+\d+\s+\d\d\d\s+\w+\s+(\d+)\s+\-\d+/} $in;
open ($in, "<$ARGV[0]") or die "can't open ARGV[0] $!\n";
%snr = rgrep {/\s\d\s+\d+\s+\d\d\d\s+(\w+)\s+\d+\s+(\-\d+)/} $in;

To import the data - but it doesn't work as the $1 is a word. I made a map for it as 
my %rate = { "Full"=>1, "Half"=>0.5, "Quarter"=>0.25, "Eighth"=>0.125 };

And tried to use:
$snr{$rate{$1}} = rgrep {/\s\d\s+\d+\s+\d\d\d\s+(\w+)\s+\d+\s+(\-\d+)/} $in;

But this doesn't work either as it seems that rgrep is looking for a numeric value.
Argument "Eighth" isn't numeric in multiplication (*) at C:\strawberry\perl\site\...

Is there a way to use rgrep to put the mapped numeric and the data in to a hash?

Thanks,

CLIFF SOBCHUK
Core RF Engineering
Phone 613-667-1974   ecn: 8109-71974
mobile 403-819-9233
yahoo: sobchuk
www.ericsson.com 

"The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who is solely responsible for this email and its contents. All inquiries regarding this email should be addressed to Ericsson. The web site for Ericsson is www.ericsson.com."

This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.ericsson.com/email_disclaimer


-----Original Message-----
From: Chris Marshall [mailto:devel.chm.01 at gmail.com] 
Sent: Tuesday, February 14, 2012 10:42 AM
To: Clifford Sobchuk
Cc: David Mertens; Perldl at jach.hawaii.edu
Subject: Re: [Perldl] How to find out cause of out of memory

Another angle, I can't tell how much of the data you collect in the perl hash structures but they are *much* more memory intensive than the pdl data arrays.

Your best chance would be to allocate the destination pdl and then use slice assignments to put the hash data into its correct place.

Beware, one issue with perl is that it dies if it runs out of memory which is a pain.  If you preallocate the big piddle, then maybe you'll get the crash in the perl code which could give you an idea where the memory use is.

--Chris

On Tue, Feb 14, 2012 at 11:22 AM, David Mertens <dcmertens.perl at gmail.com> wrote:
> Cliff -
>
> Has your client given you with some sample data so that you can try to 
> reproduce the error on your own machine? If so, a collection of 
> warnings dumped to a logfile might at least tell you which line of code is croaking.
>
> Allocation of large piddles (many hundreds of megabytes) has been 
> reported to be a problem elsewhere. One thing I have done on Linux to 
> work around this problem is to build a FastRaw file piece-by-piece, 
> then memory-mapping the file. Although this is not a possibility on 
> Windows (no PDL support for memory mapping on windows yet), it might 
> provide a means for a solution. You could build a piddle into a 
> FastRaw file with one script, then have a different script try to 
> readfraw that file. If you pull in this file early in your (second) 
> Perl process, you have a higher likelihood of getting the contiguous memory request that PDL needs for the large data array.
>
> I know, it's not ideal, but I hope that helps. I should probably try 
> to figure out how to add memory mapping support to Windows and then 
> document this technique so that others can use it.
>
> For building the FastRaw file, I can dig up some sample code and send 
> it along if that would help, but I won't be able to get to it until 
> tonight at the earliest (and I make no guarantees as it's Valentine's 
> day :-)
>
> David
>
>
> On Tue, Feb 14, 2012 at 9:26 AM, Clifford Sobchuk 
> <clifford.sobchuk at ericsson.com> wrote:
>>
>> Hi Folks,
>>
>> I am running in to a problem where I am putting in a large amount of 
>> data (variable depending on log size). The data is being pushed in to 
>> a perl array, and then converted in to a piddle. I think that it 
>> might be the conversion from perl array to piddle, but am not sure. 
>> How can I find out where the issue exists and correct it. The end 
>> users computer (laptop) will often be in this situation apparently. 
>> Since the data is intermixed with text that needs to be used to hash 
>> each specific attribute, I can't simply use an rgrep or rcols import. 
>> I can use rcols for each section, this would result in using glue to 
>> build up the piddle slowly (groups of 20 to 100 - depending on the datum for that attribute).
>>
>> Example pseudo code.
>> Foreach line {
>>        $index1 = $1 if (/index1:\s(\d+)\w+);
>>        $index2 ...
>>        if $datastart && ! $dataend {
>>                push @{$myhash{$index1}{$index2}{datum1}},$1 if 
>> (/mydata/);
>>                $dataend = 1 if (/$eod/);
>>        }
>> Foreach sort(keys %myhash) {
>>        ....for each index
>>                $data1=pdl(@{$myhash{$index1}{$index2}{datum1}});
>>        }
>> }
>>
>> The raw text files are on the order of 0.5 to 14 GB and are being run 
>> on
>> win32 (vista - which I know has a 2GB limit for applications). Hope 
>> that this provides enough information to scope the issue.
>>
>> Thanks,
>>
>>
>> CLIFF SOBCHUK
>> Ericsson
>> Core RF Engineering
>> Calgary, AB, Canada
>> Phone 613-667-1974  ECN 8109 x71974
>> Mobile 403-819-9233
>> clifford.sobchuk at ericsson.com<mailto:clifford.sobchuk at ericsson.com>
>> yahoo: sobchuk
>> http://www.ericsson.com/
>>
>> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), 
>> who is solely responsible for this email and its contents. All 
>> inquiries regarding this email should be addressed to Ericsson. The 
>> web site for Ericsson is www.ericsson.com."
>>
>> This Communication is Confidential. We only send and receive email on 
>> the basis of the terms set out at 
>> www.ericsson.com/email_disclaimer<http://www.ericsson.com/email_discl
>> aimer>
>>
>>
>>
>> _______________________________________________
>> Perldl mailing list
>> Perldl at jach.hawaii.edu
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>
>
>
> --
>  "Debugging is twice as hard as writing the code in the first place.
>   Therefore, if you write the code as cleverly as possible, you are,
>   by definition, not smart enough to debug it." -- Brian Kernighan
>
>
> _______________________________________________
> Perldl mailing list
> Perldl at jach.hawaii.edu
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>

_______________________________________________
Perldl mailing list
Perldl at jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.jach.hawaii.edu/pipermail/perldl/attachments/20120214/8331e3a1/attachment.html>


More information about the Perldl mailing list