YA Literature Data Project

Like many librarians, I am a bit of a data enthusiast. Over the last couple years, I’ve noticed a preponderance of articles looking at data surrounding books, especially children’s and YA books. Most of these analyses focus on the representation of traditionally underrepresented groups in media. There have also been numerous individuals tracking things like starred book reviews and best-of lists. Some of these include:

Rather than take one facet of the YA literature landscape and examine it, I thought it would be useful to build an open source of YA book data that tracks multiple criteria that can then be used to perform any number of analyses. As far as I know, there is not a comprehensive source of this kind of information. Several of these analyses have mentioned the lack a clear number of how many YA books are published each year. There are several sources that maintain some form of bibliographic information, though I’m not certain that any are tracking the kind of metadata that would be helpful to those researching trends in YA literature.

Some potential existing sources of data:

Last year I embarked upon a project to track all YA fiction releases. I built an online database using the free Zoho Creator and from January 1 to April 29 I entered information about all new YA releases (the ones I know about at least, more on that later). I intended for this to be a year-long project, but it became overwhelming and difficult to maintain alongside some increasing work and life responsibilities. I also came up against a lot more questions and issues that would need to be dealt with before a project like this could be a viable and useful source for analysis.

What I tracked:

  • Title and author
  • Publisher at the imprint level
  • Publication date
  • Reviews including starred reviews*
  • NY Times and USA Today Bestseller status*
  • Author gender and race/ethnicity
  • Main character(s) gender and race/ethnicity
  • Country originally published in
  • Genre(s)
  • Debut author status
  • Setting (urban/suburban/rural)*
  • Whether the book featured LGBT main character(s) or dealt with LGBT issues
  • Whether the book is stand-alone, first in a series, or a second+ title in a series
  • Which YALSA list the book appears on*

(*these categories were harder to track, so I didn’t actually use these categories for the most part)

These were the major questions that arose for me as I thought about this project and would want to address if continuing a project like this. I’d love any guidance from other librarians, readers, or authors. Realistically, I would also need additional support populating the database, especially for tracking reviews, bestseller lists, and other criteria and would need a plan for longevity and maintenance of the database.

  • What counts as published? No one would quibble with including books from the long list of traditional print publishers starting with the Big 6 (Big 5?) and moving on through the other classic publishing houses like Candlewick, etc. But we are entering a new publishing era and the line between traditional and self-publishing is blurring. So far I am not including books that appear to be self-published. But what about publishers that do exclusively ebooks? I have yet to find a reliable, comprehensive source of newly published books. I have been relying on Edelweiss book catalogs, Goodreads, reviewing sources, especially Kirkus, and the masterposts of monthly YA books from Paperbackd.
  • What counts as YA? For most books, it’s usually pretty clear and indicated by the publisher. If a book indicates ages 8-12, for me that is clearly middle grade. Something like ages 10-14 is a little dicier, but I think that’s still more middle grade. It’s for books that are something like 12 and up or even 10 and up that are a bit more challenging to determine and honestly some of it is just gut feeling. The other side of the age range–New Adult– hasn’t been much of a problem yet, but could become one as the genre becomes more popular.
  • Publication Date: I’ve come across a couple books that have different publication dates in different places, notably book review sources often differ from Goodreads or Amazon. I’m trying to do a month at a time and not get too far ahead since I know pub dates do change. If possible, I try to go with the official publisher info (though publisher sites are often not up-to-date, which alarms me a bit!).
  • Publisher: I’m trying to be accurate with publisher and imprint, but sometimes the inter-connectedness of publishing houses and distributors still baffles me!
  • Author gender and race/ethnicity: Gender is usually easy to come by, especially with Goodreads profiles and author websites. I don’t want to presume anyone’s preferred gender, though, so I’m still treading lightly here. Race/ethnicity is something that most people don’t declare in a bio or public profile, so some of it is presumption based on photos. A few authors of color do explicitly state their ethnic backgrounds in their bios, so I use that as a source. I honestly don’t know how to deal with this issue short of sending out a demographic survey to all YA authors (kidding!)
  • Main character gender and race/ethnicity: Like authors, gender of main characters is usually easy to determine. For better or worse, when a character is transgender or chooses to identify in some other way, this tends to be called out in the book description. There are also a few books that have multiple main characters–I allow for two in my tracking, which accounts for most books, but there are still several books with large casts of characters that cannot be tracked in this way. Race/ethnicity is very rarely explicitly called out in a book summary. Again, it seems to only be mentioned when the character is ‘other’ in some way. For YA books, ‘other’ is non-white. Book covers are not necessarily a source for this, since there have been notable cases of white-washing characters of book covers (see the Book Smugglers post on this topic). Should we assume a default white character if it’s not explicitly called out in the text? Again, this area is highly problematic in many ways. I’m not sure how other analyses of race in literature have handled assigning a race to a main character, but it’s definitely an issue to consider.
  • Debut authors: I think it’s important to track debut authors to see how many new voices we are hearing in literature. But how do we count debut status? First novel ever? First YA novel? Do self-published titles count? What about authors who have published in other countries first, but this is their first American publication? I have allowed for two debut categories–first-time author and first-time YA author–and rely on Goodreads to determine. If they have published short stories in collections, I do not count that as being published. I will likely go with the criteria used with the Morris Award (though even that is not cut and dry as Kelly Jensen has pointed out at Stacked).
  • Tracking reviews: Ideally I would like to track when a book gets reviewed in one of the major reviewing sources, including when it gets a starred review, and eventually, which year-end best-of lists it ends up on. Kirkus puts all of its reviews online, so those are easy to track. I also have access to Booklist Online and they have a great search feature, so I can be sure to track all of those. Horn Book, SLJ, and BCCB put their starred lists online, but not all of their reviews, so I am lacking those. I do have online Horn Book, SLJ, and PW access through a database, but it’s not easy to read or search (text-only, no PDFs). VOYA is even harder to track down. So I do not have comprehensive access to these reviews. This is where more contributors would be super helpful!
  • Genre: I’m trying to keep genre minimal, but useful. My categories may not be as faceted as I would like, but it’s hard to determine when I haven’t read every book. In a limitless world, this would have an extensive tagging inventory to track all kinds of subject matter, but that’s probably beyond the scope of this at the moment.
  • Format: Most books tracked will be standard novels, but I am tracking various formats like short story collections, graphic novels, novels in verse, mixed media, etc.
  • The Non-Fiction question: I have only been including fiction. But what about more novel-like non-fiction? I’m thinking something like Bomb or The Notorious Benedict Arnold. What memoirs or fictionalized accounts of real people?

As you can see it’s very daunting! I’ve also realized how much I really don’t know about database creation and maintenance, and especially about data manipulation and presentation. All of that said, I am still majorly interested in working on something like this. Please let me know if you have any thoughts about continuing some sort of open YA book data project!

Leave a Reply

Your email address will not be published. Required fields are marked *