One-to-one SQL join from among the many?

Waldo Jaquith June 9, 2006 13 Comments

I have two tables in MySQL. One of them (let’s say, for the purpose of simplicity) is a listing of books. The other is a listing of reviews of those books along with a 1-5 ranking. Each book may have 0 or more reviews. It’s your standard one-to-many relationship.

However, I want to have a one-to-one left join in order to generate a listing of books along with their most recent ranking, like such:

Da Vinci Code: 1
The Omnivore’s Dilemma: 5
The Half-Blood Prince: N/A
Freakonomics: 3
Cesar’s Way: 3

Again, this is not the average ranking, simply the one accompanying the most recent review, assuming there is one.

Here’s the question: What’s the simplest method of joining the two tables to include only the ranking from that most recent review? I just know I’m overlooking something simple.

Published by Waldo Jaquith

Waldo Jaquith (JAKE-with) is an open government technologist who lives near Charlottesville, VA, USA. more » View more posts

13 replies on “One-to-one SQL join from among the many?”

Duane Gran says:

June 9, 2006 at 3:21 pm

I believe you want:

SELECT LAST(b.book, r.rank) from book AS b, rank AS r WHERE b.id = r.fid ORDER BY r.date

At least, that is my take without seeing the schema and my limited SQL knowledge. I’m curious if others have a better approach. Note that r.fid assumes that it is a foreign key.
Waldo Jaquith says:

June 9, 2006 at 3:40 pm

MySQL does not, to my knowledge, have a SELECT LAST function. Though I wish it did. :)
Tom Herrick says:

June 9, 2006 at 4:00 pm

I can’t think of a way to do this with an inner join, but it’s easy to do with a subselect. The following example works under SQL Server, but it’s pretty basic SQL, so I think it’ll all work the same way in MySQL as well:

create table book (book_name varchar(32))
create table rank (book_name varchar(32), rank int, date datetime)

insert book (book_name) values (‘a’)
insert book (book_name) values (‘b’)
insert book (book_name) values (‘c’)

insert rank (book_name, rank, date) values (‘a’, 1, ‘2006-01-01’)
insert rank (book_name, rank, date) values (‘a’, 2, ‘2006-02-01’)
insert rank (book_name, rank, date) values (‘a’, 3, ‘2006-03-01’)
insert rank (book_name, rank, date) values (‘a’, 4, ‘2006-04-01’)

insert rank (book_name, rank, date) values (‘b’, 4, ‘2006-01-01’)
insert rank (book_name, rank, date) values (‘b’, 3, ‘2006-02-01’)
insert rank (book_name, rank, date) values (‘b’, 2, ‘2006-03-01’)
insert rank (book_name, rank, date) values (‘b’, 1, ‘2006-04-01’)

select b.book_name,
(select top 1 rank r from rank r where r.book_name=b.book_name order by r.date)
from book b
group by b.book_name
Craig says:

June 9, 2006 at 4:00 pm

Here is a question for you… why not get the average ranking for the book as opposed to only the last ranking?

For example:

Select
BookId,
BookName,
BookPrice,
BookEtc,
(Select AVG(Ranking) From Ranking Where fkBookId = BookId) As BookRank
From Book

Otherwise you could use a similiar subquery to get only the one rank, i.e.
(Select Top 1 Ranking From Ranking Where fkBookId = BookId Order By RankingDate DESC)

Just a thought. Good luck. Having written literally dozens of stored procedures in the last few weeks I sympathize…

Craig
Tom Herrick says:

June 9, 2006 at 4:02 pm

Actually, as written, that’ll give the oldest ranking. Change the subquery to use ‘order by r.date desc’ to get the newest ranking.
Waldo Jaquith says:

June 9, 2006 at 4:19 pm

Here is a question for you… why not get the average ranking for the book as opposed to only the last ranking?

I’ve presented a simplified version of what I’m actually trying to do here—this is actually a peer-review system for submitted papers, and the editor’s review system needs to be able to see only the last ranking, and then assign the paper to another reviewer to get his opinion. (Wash, rinse repeat.) Suffice it to say, the average ranking is useless to me.

Tom, your subselect suggestion is a good one — I’m going to give it a whirl. Thanks for taking the time to put that together. I’m still scratching my head over the problem of not being able to do this with a simple join, but that may well indicate my shortcomings more than MySQL’s. :)
Kevin Druff says:

June 9, 2006 at 4:39 pm

Waldo — Create a date stamp and pick based upon MAX(Date).
Justin says:

June 9, 2006 at 4:40 pm

How malleable is the current schema? I’m not sure what to tell you if you’re stuck with specifically that which you presented (i.e. no date column), but if you can manipulate it, the date column would solve all your problems. Make it a datetime, and it’s as easy as:

SELECT book.name, review.rank FROM book, review WHERE book.id = review.book_id ORDER BY review.date DESC LIMIT 1

I’m confident I’ve seen similar statements in your code many times before, so maybe I misunderstood your question?
Justin says:

June 9, 2006 at 4:42 pm

^^ Yeah, if you can add columns, add an id or date column and Max() it out. LAST_INSERT_ID() would also work if you go with an auto-increment column.
Kevin Druff says:

June 9, 2006 at 4:43 pm

You could also create a primary key on the review table and you can choose SELECT MAX(review.ID), review.score from review WHERE book.book_id = review.book_ID…

That’s probably the easiest solution, I think.
I'm Not Emeril says:

June 9, 2006 at 4:56 pm

Waldo,
I’ve done something similar to that. I needed to extract the last value from a flowmeter…I think, or maybe it was a scale, I can’t remember.
Let me look through some old projects and I’ll get back to you. The final solution was pretty simple, as I recall.
Alton
Waldo Jaquith says:

June 9, 2006 at 5:05 pm

The schema is actually quite malleable. I’ve got a primary key (an auto_increment int), and a timestamp.

The trick is all of the joins:

SELECT submission.id, submission.title, submission_reader.reader, submission_reader.recommendation, submission.genre, submission_author.name AS author, DATE_FORMAT(submission_reader.date_modified, "%m/%d/%y") AS date_modified, MAX(submission_reader.date_created) AS date_created FROM submission_reader LEFT JOIN submission ON submission.id = submission_reader.submission_id LEFT JOIN submission_author ON submission.author_id = submission_author.id WHERE submission_reader.recommendation IS NOT NULL AND submission.status = "under review" GROUP BY submission.id ORDER BY submission_reader.date_created ASC, submission.title ASC

The whole query makes my head spin. :)
Tim McCormack says:

June 9, 2006 at 11:30 pm

MAX(date) is better than MAX(id), semantically speaking. Also, there is no guarantee of id being monotonically increasing, whereas that does hold true for date.

Comments are closed.