[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comma seperators



Paul van Delst wrote:
> 
> Ben Tupper wrote:
> >
> > Paul van Delst wrote:
> >
> > > Simon de Vet wrote:
> > > >
> > > > I am reading in data that looks like the following:
> > > >
> > > > CHATHAM ISLAND - NEW ZEALAND (DOE),,,,,,,,,,
> > > > 43.92°S,176.50°W,,,,,,,,,
> > > > 16-Sep-1983,11-Oct-1996,,,,,,,,,
> > > > Mon,Stat,Cl,NO3,SO4,Na ,SeaSalt,nssSO4,MSA,Dust,NH4
> > > > of,Param,Air,Air,Air,Air,Air,Air,Air,Air,Air
> > > > Yr,*,µg/m3,µg/m3,µg/m3,µg/m3,µg/m3,µg/m3,µg/m3,µg/m3,µg/m3
> > > > Jan,N,58,58,58,58,58,57,0,0,58
> > > > Jan,Mean,7.330,0.120,1.572,4.233,13.766,0.508,#N/A,#N/A,0.103
> > > > Jan,StdDev,2.788,0.055,0.412,1.479,4.811,0.249,#N/A,#N/A,0.051
> > > >
> > > > Which continues untill the end of the year, and then another observation
> > > > station follows the fame general format.
> > > >
> > > > I want to be able to read in the data into an array. I can already take
> > > > out the header, but I cannot read in the data.
> > >
> > > What do you consider the header?
> > >
> > > > By default, IDL is
> > > > treating each line as one entry, not recognizing the commas as entry
> > > > seperators. I've read the help extensively, but as a non-fortran user,
> > > > the input format documentation makes my brane hurt.
> > >
> > > Let's say you have:
> > >
> > > Jan,N,58,58,58,58,58,57,0,0,58
> > > Jan,Mean,7.330,0.120,1.572,4.233,13.766,0.508,#N/A,#N/A,0.103
> > > Jan,StdDev,2.788,0.055,0.412,1.479,4.811,0.249,#N/A,#N/A,0.051
> > > Feb,N,58,58,58,58,58,57,0,0,58
> > > Feb,Mean,7.330,0.120,1.572,4.233,13.766,0.508,#N/A,#N/A,0.103
> > > Feb,StdDev,2.788,0.055,0.412,1.479,4.811,0.249,#N/A,#N/A,0.051
> > > ..etc..
> > >
> > > How about:
> > >
> > > char_buffer = ' '
> > >
> > > REPEAT BEGIN
> > >   READF, lun, char_buffer
> > >
> > >   input_data = STR_SEP( char_buffer, ',' )
> > >
> > >   ....here split up the data how you want by, say, testing
> > >       input_data[0] == month (Jan, Feb, Mar, ....
> > >       input_data[1] == data type (N, Mean, StdDev)
> > >   ....and checking for invalid data, e.g. the #N/A thingoes
> > >
> > > ENDREP UNTIL EOF( lun )
> > >
> > >
> >
> > Hello,
> >
> > I'ld like to add that on occasion, I have found it useful to add the /TRIM
> > keyword to the STR_SEP() function.
> > Once in a while the last element  in input_data will become something
> > unexpected, such as the expected value padded with blanks.   I think
> > the problem is in how the file was written, not in how it is read by IDL.
> 
> You know, the same thought occurred to me when I used this method to
> read *space*-separated data - I kept getting extra "fields" at the
> beginning of my string. I stuck the /TRIM keyword in the STRSEP call and
> nothing changed!!?? Weird.
> 
> So instead of doing a
> 
> result = STRSEP( string, ' ', /TRIM )
> 
> I do a
> 
> result = STRSEP( STRTRIM( string, 2 ), ' ' )
> 
> Mind you this was one of those cases where something didn't work
> straight up and I spent precisely 0.1seconds figuring out why not before
> going on to something else.. :o)
> 
> BTW, is there some sequence of layered string function calls one can use
> to trim and "collapse" a string with multiple delimiters between items
> to a single delimiter? e.g. to convert
> 
> ,,,this,,,is,,,,a,,multiple,,,,,delimited,,,,,,,,string,,,,
> 
> to
> 
> this,is,a,multiple,delimited,string
> 
> I wrote a function to do it but it has a loop in it and a bunch of logic
> checking that looks horrendous. It does the job, but no reason why it
> can't look pretty....right?
> 

res=strsplit(str,',',/EXTRACT)

will do it.  The reason is null-length fields are *not* returned unless you use
PRESERVE_NULL.  You can also split on regular expressions.  So, e.g. if you
could be delimited by one or more spaces or commas, you could use:

res=strsplit(str,'[ ,]+',/REGEX,/EXTRACT)

This is mostly v5.3 specific.

JD

-- 
 J.D. Smith                             |*|      WORK: (607) 255-5842    
 Cornell University Dept. of Astronomy  |*|            (607) 255-6263
 304 Space Sciences Bldg.               |*|       FAX: (607) 255-5875 
 Ithaca, NY 14853                       |*|