Lazy sequence for reading file

 
   Computer Help Archives Forum Index -> LISP
View previous topic :: View next topic  
AuthorMessage
grackle
Guest





PostPosted: Thu Nov 16, 2006 2:18 pm    Post subject: Lazy sequence for reading file

I'm writing yet another flat file parser, and considering how often I
do this I would very much like to write:

(defun parse-test-case (stream)
(map 'vector #'parse-line (read-lines stream)))

I can do this easily myself if I write write read-lines to simply
accumulate the lines in a sequence. I would rather do it lazily,
though, to avoid reading the entire file into memory. As far as I can
tell, a list is a list is a list. It's a concrete thing, so I can't
create my own list type. That means I need to use a sequence type.

Is there an existing sequence type that does this? If not, I need to
implement a sequence type. Section 17.1 of the Hyperspec listst
sequence functions; are these generic functions that I need to
specialize for my sequence type? Is there a smaller subset that is
sufficient?

Also, I often want to throw away the map results, so I want X where
map : X :: mapcar : mapc. I will need to write this myself, correct?

Thanks,
David
Back to top
Andrew Philpot
Guest





PostPosted: Thu Nov 16, 2006 3:19 pm    Post subject: Re: Lazy sequence for reading file

In article <1163701117.305692.82890@f16g2000cwb.googlegroups.com>,
grackle wrote:

Quote:
I'm writing yet another flat file parser, and considering how often I
do this I would very much like to write:

(defun parse-test-case (stream)
(map 'vector #'parse-line (read-lines stream)))

I can do this easily myself if I write write read-lines to simply
accumulate the lines in a sequence. I would rather do it lazily,
though, to avoid reading the entire file into memory. As far as I can
tell, a list is a list is a list. It's a concrete thing, so I can't
create my own list type. That means I need to use a sequence type.

You could implement a lazy list, and give it an API that mirrors that
of built-on CONS-constructed lists, starting with MAP, etc. Since you
don't use the lazy list as a data structure (assign, mutate, etc.),
one might argue this would be of little practical benefit.

In Lisp, "lazy" sometimes ends up as "implemented with a closure."
Another way to do this would be do forgo the lazy list directly and
define a function that embodies the closure:

(map-over-lines-of-file #'parse-line stream)

or a macro wrapping of something like the above

(with-lines-from-file (stream file)
(parse-line stream))

Of course you could implement the lazy list, then the mapper on top of
it, then the macro on top of it!

When I recapitulate this kind of idiom, my solutions look like the
WITH-LINES-FROM-FILE version. Not surprisingly, these tend to take
quite a few keyword arguments like WITH-OPEN-FILE (often but not
always defaulted). Including all of these might tend to make your
READ-LINES lazy list specification a lot less perspicuous. With
either but particularly well with the WITH-LINES-FROM-FILE syntax, you
can use either functional filters or control flow to limit what gets
collected...

Quote:

Is there an existing sequence type that does this? If not, I need to
implement a sequence type. Section 17.1 of the Hyperspec listst
sequence functions; are these generic functions that I need to
specialize for my sequence type? Is there a smaller subset that is
sufficient?

Also, I often want to throw away the map results, so I want X where
map : X :: mapcar : mapc. I will need to write this myself, correct?

For this, I think you want MAP with first argument output-type-spec
NIL.

Quote:

Thanks,
David


#A
Back to top
Guest






PostPosted: Thu Nov 16, 2006 3:44 pm    Post subject: Re: Lazy sequence for reading file

grackle wrote:
Quote:
I'm writing yet another flat file parser, and considering how often I
do this I would very much like to write:

(defun parse-test-case (stream)
(map 'vector #'parse-line (read-lines stream)))

I can do this easily myself if I write write read-lines to simply
accumulate the lines in a sequence. I would rather do it lazily,
though, to avoid reading the entire file into memory. As far as I can
tell, a list is a list is a list. It's a concrete thing, so I can't
create my own list type. That means I need to use a sequence type.

Is there an existing sequence type that does this? If not, I need to
implement a sequence type. Section 17.1 of the Hyperspec listst
sequence functions; are these generic functions that I need to
specialize for my sequence type? Is there a smaller subset that is
sufficient?

Also, I often want to throw away the map results, so I want X where
map : X :: mapcar : mapc. I will need to write this myself, correct?

I'm confused as to what you're trying to do... you want to be able to
call PARSE-LINE on the result of READ-LINE for each line of the file
and collect the results in a vector? Do you know the length of the
vector beforehand?

(defun parse-test-case (stream)
(let ((done (gensym))
xx)
(do ((line (read-line stream nil done)
(read-line stream nil done))
((eq line done) (coerce (nreverse xx) 'vector))
(push (parse-line line) xx))))

or is your problem not being able to use a generic MAPC/MACPAR in this
situation?

(defun do-lines ((var stream &optional return) &body body)
(let ((done (gensym))
(s (gensym)))
`(let ((,s ,stream))
(do ((var (read-line ,s nil ,done)
(read-line ,s nil ,done)))
((eq ,var ,done) ,return)
,@body))))

(defun mapc-lines (fn stream)
(do-lines (l stream)
(funcall fn l)))

(defun mapcar-lines (fn stream)
(let (xx)
(do-lines (l stream (nreverse xx))
(push (funcall fn l) xx))))

(defun parse-test-case (stream)
(coerce (mapcar-lines #'parse-line stream) 'vector))

hth

Nick

Quote:

Thanks,
David
Back to top
grackle
Guest





PostPosted: Thu Nov 16, 2006 4:39 pm    Post subject: Re: Lazy sequence for reading file

nallen05@gmail.com wrote:
Quote:
I'm confused as to what you're trying to do... you want to be able to
call PARSE-LINE on the result of READ-LINE for each line of the file
and collect the results in a vector? Do you know the length of the
vector beforehand?

Sometimes I want to collect, sometimes not. I usually don't know the
length of the input beforehand.

Quote:
or is your problem not being able to use a generic MAPC/MACPAR in this
situation?

Yes. There are many macros and functions that accept sequence types,
and I don't want to reimplement every one I use for every sequence-like
type I create! I would like to create a sequence object that allows me
to use any macro or function that expects a sequence.

Andrew Philpot wrote:
Quote:
You could implement a lazy list, and give it an API that mirrors that
of built-on CONS-constructed lists, starting with MAP, etc.

The list and sequence APIs are there and they work. I think all I need
to know is how to create a subtype of list or sequence. Then only a
single function make-lazy-readline-list or make-lazy-readline-sequence
is needed to take advantage of map, mapc, mapcar, dolist, etc.

Quote:
Also, I often want to throw away the map results, so I want X where
map : X :: mapcar : mapc. I will need to write this myself, correct?

For this, I think you want MAP with first argument output-type-spec
NIL.

Thanks!

-David
Back to top
Barry Margolin
Guest





PostPosted: Thu Nov 16, 2006 10:01 pm    Post subject: Re: Lazy sequence for reading file

In article <1163709551.403279.58060@m73g2000cwd.googlegroups.com>,
"grackle" <davidhuebel@gmail.com> wrote:

Quote:
Yes. There are many macros and functions that accept sequence types,
and I don't want to reimplement every one I use for every sequence-like
type I create! I would like to create a sequence object that allows me
to use any macro or function that expects a sequence.

Sequences are not object-oriented, so there's no standard way to extend
the sequence functions.

You also can't define new arithmetic types -- for instance, you can't
make the + function operate on matrices.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
Back to top
Lars Rune Nøstdal
Guest





PostPosted: Fri Nov 17, 2006 5:05 am    Post subject: Re: Lazy sequence for reading file

On Thu, 16 Nov 2006 21:01:54 -0500, Barry Margolin wrote:

Quote:
In article <1163709551.403279.58060@m73g2000cwd.googlegroups.com>,
"grackle" <davidhuebel@gmail.com> wrote:

Yes. There are many macros and functions that accept sequence types,
and I don't want to reimplement every one I use for every sequence-like
type I create! I would like to create a sequence object that allows me
to use any macro or function that expects a sequence.

Sequences are not object-oriented, so there's no standard way to extend
the sequence functions.

You also can't define new arithmetic types -- for instance, you can't
make the + function operate on matrices.

cl:+ isn't a method, but you could create a my-cl:+ that is:


(defpackage :my-cl
(:use :cl)
(:shadow :+))
(in-package :my-cl)


(defclass not-really-a-matrix ()
((a :accessor a-of :initarg :a :initform 0)
(b :accessor b-of :initarg :b :initform 0)))


(defmethod + ((x number) &rest args)
(apply #'cl:+ x args))


(defmethod + ((x not-really-a-matrix) &rest args)
(apply #'+ (a-of x) (b-of x) args))


...or something..then `:use my-cl' in your application-package.

--
Lars Rune Nøstdal
http://lars.nostdal.org/
Back to top
Joel Ray Holveck
Guest





PostPosted: Fri Nov 17, 2006 6:30 am    Post subject: Re: Lazy sequence for reading file

On Nov 16, 12:39 pm, "grackle" <davidhue...@gmail.com> wrote:

Quote:
situation?Yes. There are many macros and functions that accept sequence types,
and I don't want to reimplement every one I use for every sequence-like
type I create! I would like to create a sequence object that allows me
to use any macro or function that expects a sequence.

First: There are no facilities to write your own sequence types and
have them work with the Common Lisp sequence functions. Those aren't
generic functions. As others have mentioned, you could easily write
your own package that re-exports the bulk of the CL symbols, and
shadows the CL sequence functions with generic replacements.
Presumably, the default primary method of the generic replacements
would call the CL version.

Second: All of the built-in sequence functions are written with the
assumption that the entire contents of the sequence are available
immediately. For example, if you fed MAP a lazy sequence, you would
probably like to be able to get a lazy sequence back from it... but the
MAP function was written to apply itself to everything immediately.
Reusing the Common Lisp sequence functions on lazy sequences is futile,
since they're written with different assumptions in mind.

Dick Waters did some work regarding lazy value generation. Some
implementations might support his proposals, or there might be
libraries out there.

joelh
Back to top
Christophe Rhodes
Guest





PostPosted: Fri Nov 17, 2006 7:42 am    Post subject: Re: Lazy sequence for reading file

"Joel Ray Holveck" <joelh@piquan.org> writes:

Quote:
First: There are no facilities to write your own sequence types and
have them work with the Common Lisp sequence functions. Those aren't
generic functions.

This is true at the moment, though nothing (or almost nothing: see CDR
3) in the standard prevents an implementation from allowing this.

Quote:
Second: All of the built-in sequence functions are written with the
assumption that the entire contents of the sequence are available
immediately. For example, if you fed MAP a lazy sequence, you would
probably like to be able to get a lazy sequence back from it... but the
MAP function was written to apply itself to everything immediately.
Reusing the Common Lisp sequence functions on lazy sequences is futile,
since they're written with different assumptions in mind.

The MAP example is a good one, because it's not amenable to
specialization in the ordinary CLOS manner. However, for many of the
CL sequence functions it is in fact possible to get the lazy -> lazy
behaviour that one might want from sequence functions, so one could
imagine (say) REMOVE implemented lazily on a lazy sequence.

Cheers,

Christophe
Back to top
Wade Humeniuk
Guest





PostPosted: Fri Nov 17, 2006 10:32 am    Post subject: Re: Lazy sequence for reading file

grackle wrote:
Quote:
I'm writing yet another flat file parser, and considering how often I
do this I would very much like to write:

(defun parse-test-case (stream)
(map 'vector #'parse-line (read-lines stream)))

I can do this easily myself if I write write read-lines to simply
accumulate the lines in a sequence. I would rather do it lazily,
though, to avoid reading the entire file into memory. As far as I can
tell, a list is a list is a list. It's a concrete thing, so I can't
create my own list type. That means I need to use a sequence type.

Is there an existing sequence type that does this? If not, I need to
implement a sequence type. Section 17.1 of the Hyperspec listst
sequence functions; are these generic functions that I need to
specialize for my sequence type? Is there a smaller subset that is
sufficient?

Also, I often want to throw away the map results, so I want X where
map : X :: mapcar : mapc. I will need to write this myself, correct?


It's been done before. Look at the SERIES package.

http://series.sourceforge.net/

Wade
Back to top
grackle
Guest





PostPosted: Fri Nov 17, 2006 10:50 am    Post subject: Re: Lazy sequence for reading file

Joel Ray Holveck wrote:
Quote:
First: There are no facilities to write your own sequence types and
have them work with the Common Lisp sequence functions.

Thanks; this is the answer I needed. I would be curious to hear why.

Quote:
Second: All of the built-in sequence functions are written with the
assumption that the entire contents of the sequence are available
immediately. For example, if you fed MAP a lazy sequence, you would
probably like to be able to get a lazy sequence back from it... but the
MAP function was written to apply itself to everything immediately.
Reusing the Common Lisp sequence functions on lazy sequences is futile,
since they're written with different assumptions in mind.

It would work in the case where you want to immediately process as much
of the sequence as is available, which would include reading from a
file or a socket. Potentially infinite sequences could be used as long
as one of the sequences being mapped could be depended on to block or
terminate when appropriate:

(mapc (lambda (line number) (format t "Line ~a: ~a~%" number line))
(make-readlines-sequence my-input-stream)
(make-integer-sequence :start 1 :step 1))

I'm just looking for the most carpal-friendly way to parse flat files,
so Nick's suggestion is the way to go, I think.

-David
Back to top
Barry Margolin
Guest





PostPosted: Fri Nov 17, 2006 11:52 am    Post subject: Re: Lazy sequence for reading file

In article <pan.2006.11.17.09.05.11.720229@gmail.com>,
Lars Rune Nøstdal <larsnostdal@gmail.com> wrote:

Quote:
On Thu, 16 Nov 2006 21:01:54 -0500, Barry Margolin wrote:

In article <1163709551.403279.58060@m73g2000cwd.googlegroups.com>,
"grackle" <davidhuebel@gmail.com> wrote:

Yes. There are many macros and functions that accept sequence types,
and I don't want to reimplement every one I use for every sequence-like
type I create! I would like to create a sequence object that allows me
to use any macro or function that expects a sequence.

Sequences are not object-oriented, so there's no standard way to extend
the sequence functions.

You also can't define new arithmetic types -- for instance, you can't
make the + function operate on matrices.

cl:+ isn't a method, but you could create a my-cl:+ that is:
....
..or something..then `:use my-cl' in your application-package.

That won't make all the functions not written by him use it.

Relating this back to the OP's problem, he wants built-in functions like
MAPCAR to be able to iterate over his lazy sequences. To make use of
your suggestion he would have to create MY-CL:MAPCAR, MY-CL:LOOP,
MY-CL:DO, etc.

Compare this will languages like CLU and C++, which have a standard
generic protocol for iterators, so that programmers can define their own
sequences and make use of the built-in for-loop.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
Back to top
grackle
Guest





PostPosted: Fri Nov 17, 2006 2:58 pm    Post subject: Re: Lazy sequence for reading file

Barry Margolin wrote:
Quote:
Compare this will languages like CLU and C++, which have a standard
generic protocol for iterators, so that programmers can define their own
sequences and make use of the built-in for-loop.

This is what I'm accustomed to and was expecting. It's very un-Lispy
to make a fundamental part of the language off-limits to users.

-David
Back to top
Joel Ray Holveck
Guest





PostPosted: Fri Nov 17, 2006 6:49 pm    Post subject: Re: Lazy sequence for reading file

Christophe Rhodes wrote:

Quote:
The MAP example is a good one, because it's not amenable to
specialization in the ordinary CLOS manner. However, for many of the
CL sequence functions it is in fact possible to get the lazy -> lazy
behaviour that one might want from sequence functions, so one could
imagine (say) REMOVE implemented lazily on a lazy sequence.

Oh, I think you misunderstand me. I'm just saying that the only thing
that he could reuse would be the names... the implementations wouldn't
be reusable. Sure you could implement REMOVE against a lazy sequence,
but it would have to be an all-new implementation.

joelh
Back to top
Display posts from previous:   
   Computer Help Archives Forum Index -> LISPAll times are GMT - 5 Hours
Page 1 of 1

 
Jump to: