[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading in text data



> Brian Reardon (reardonb@my-deja.com) writes:
> 
> > I am reading in text data (columns and rows of numbers) and I would
> > like to know if there is a more elegant way of doing it.  Currently, the
> > user must specify how many columns there are.  In my case the number of
> > columns is manually inserted into the first line of the file like this:
> >
> > 3
> > 0 1 2
> > 1 2 3
> > 2 3 4
> > 3 4 5
> > 4 5 6
> > 5 6 7
> > 6 7 8
> > 7 8 9
> > 8 9 10
> > 9 10 11
> >
> > The attached procedure reads in the data.  Is there a way to read in the
> > data such that the user does not have to a priori know how many columns
> > there are and such that IDL does not have to reserve a large amount of
> > memory for the number of rows?
> 

Wot about DDREAD.PRO (and associated routines) by F.K.Knight? I use it
all the time. It allows you skip row, columns so the first line being a
single number shouldn't matter.


Check out

http://www.astro.washington.edu/deutsch/idl/htmlhelp/library38.html

where you'll find:




Routine Descriptions

DDREAD

[Next Routine] [List of Routines] 

 Name:
        ddread
 Purpose:
        This routine reads data in formatted (or unformatted) rows and
columns.
        The name stands for Data Dump Read.  By default, comments are
        skipped and the number of columns is sensed.  Many options
        exist, e.g., selecting rows and columns, reading binary data,
        and selecting non-default data type and delimiters.

 Examples:
        junk = ddread(/help)                    ; get information only
        array = ddread(file)                    ; read ASCII data
        array = ddread(file,/formatted)         ; ditto
        array = ddread(file,object=object)      ; read binary data
        array = ddread(file,columns=[0,3])      ; get only 1st & 4th
columns
        array = ddread(file,rows=lindgen(10)+10); get only 2nd 10 rows
        array = ddread(file,offset=10,last=19)  ; get rows (10,19)
        array = ddread(file,/countall)          ; count comment lines
        array = ddread(file,/verbose)           ; echo comment lines
        array = ddread(file,type=1)             ; return bytes, not
floats or longs
        array = ddread(file,range=['start text','stop text'])   ; text
delimiters

        ; Place the detailed output from a Lowtran run in a 2-D
array---wow!
        output = ddread('lowtran.out',range=['(CM-1)
(MICRN)','0INTEGRATED ABSORPTION'])
        % DDREAD: Read 69 data lines selecting 14 of 14 columns; skipped
395 comment lines.
 Usage:
        array = ddread([file][,options][,/help])
 Optional Inputs:
        file = file with data; if omitted, then call pickfile.
 Keywords:
        /formatted, /unformatted = flags to tell IDL whether data format
is
                binary or ASCII.  ddread tries to determine the type
                of data but it's not foolproof.
        object = a string containing the IDL declaration for one
instance
                of the object in an unformatted file, e.g.,
                        'fltarr(4)'
                or
                        '{struct,dwell:0.,pitch:0.,yaw:0.,roll:0.}'
        rows = an array to select a subset of the rows in a formatted
file
                Does not count comment lines, unless /countallrows is
set!
        columns = likewise for columns
        type = data type of the output D=float (if '.' appears) or long
        delimiter = column separater, D=whitespace
        /help = flag to print header
        range = start and stop row or strings,
                e.g. range = ['substring in 1st line','substring in last
line']
        offset = start row (read to end of file, unless last set)
        last = stop row (read from start of file, unless offset set)
        /countallrows = flag to count comment rows as well as data rows
(D=0)
        /verbose = flag to echo comments to screen
 Outputs:
        array = array of data from the lines (ASCII) or objects (binary)
 Common blocks:
        none
 Procedure:
        After deciding on ASCII or binary, read file and return array.

 Restrictions:
        - Comments can be either an entire line or else an end of a
line, e.g.,
                /* C comment. */
                ; IDL comment
                Arbitrary text as a comment
                Comment in Fortran
                The next line establishes # of columns (4) & data type
(float):
                6. 7 8 9
                This line and the next are both considered comments.
                6 comment because only one of 4 columns appears
                1 2 3 4 but this line has valid data and will be read as
data

        - Even if a range of lines is selected with offset, range or
last, all
          lines are read.  This could be avoided.

        - Other routines needed:
          pickfile.pro  - to choose file if none is given
          nlines.pro    - to count lines in a file
          nbytes.pro    - to count bytes in a variable
          replicas.pro  - to replicate arrays (not scalars as in
replicate.pro)
          typeof.pro    - to obtain the type of a variable

 Modification history:
        write, 22-26 Feb 92, F.K.Knight (knight@ll.mit.edu)
        allow reading with arbitrary delimiter using reads, 23 Mar 92,
FKK
        add countallrows keyword and modify loop to read as little
          data as possible, 20 May 92, FKK
        correct bug if /formatted set, 6 Jul 92, FKK
        add verbose keyword to print comments, 6 July 92, FKK
        correct bug if /rows=...,/countall set, 6 July 92, FKK & EJA
        add a guard against a blank line being converted to a 
          number, 21 Aug 92, FKK
        allow parital line just before the EOF.  Possibly this isn't the
          right thing to do, but I decided to allow it.  If the final
line
          is incomplete, the values are still read and the remainder of
          the line is filled with zeroes. 26 Oct 92, FKK
        allow range keyword to be a string array, 2 Dec 92, FKK
        make default for countallrows be true if range is present, 2 Dec
92, FKK
        add new function (typeof); called in a few places, 2 Dec 92, FKK

-- 
Paul van Delst           Ph:  (301) 763-8000 x7274
CIMSS @ NOAA/NCEP        Fax: (301) 763-8545
Rm.202, 5200 Auth Rd.    Email: pvandelst@ncep.noaa.gov
Camp Springs MD 20746