I'm looking for some help regarding my trance book (Discogs API, SQL, programming)

Aug 23, 2022
110 Posts
103 Thanked
I would like to optimize my workflow, so I can progress faster with the research that is required to finish my book. I would need some help from someone who has some basic SQL and programming knowledge, this way grabbing some data from Discogs. This would be a one-time thing.
 

Manofearth

Senior Member
Sep 24, 2020
436 Posts
242 Thanked
Fiver.com is an excellent resource for sourcing cheap technical labour like this.
 
Aug 23, 2022
110 Posts
103 Thanked
@TheTranceHistorian could you share more details?

Sure!

  1. I need a custom script that can grab/extract all TRACK NAMES from all 12" Trance releases/albums from 1988-2009. Because Discogs doesn't recognize Trance as a genre but as a style instead, this process would have to be done with four different styles: Trance, Progressive Trance, Hard Trance, and Tech Trance, to ensure that nothing will be missed (Neo Trance, Goa Trance, and Psy Trance are not needed for reasons I explained in my post about the book).
  2. It's important to note that the following data would all need to be grabbed along with track names: album name, artist name, and the year of release.
  3. When all this compiled data gets inserted into a table/chart (SQL/Excel/Google Sheets/whatever), all duplicates should be removed. So if the artist name + track name is the same in ROW1, ROW44, and ROW456, then ROW44 and ROW456 should be removed in order to ensure that no track appears more than once on this list. Furthermore, if the given track has multiple release years associated with it (which year is obviously based on its album's release year), then always the row with the earliest release year should stay (to make sure that the data always shows the given track's actual first release date).
  4. The script should generate a YouTube hyperlink for all track names, so I could initiate a search for the given track by simply clicking on its name.
    All YouTube searches start with this: "https://www.youtube.com/results?search_query=" and after that comes that part you want to search for, where the "+" sign marks the spot of spaces. So, for example, if you want to search for "Age Of Love - The Age Of Love (Watch Out For Stella Club Mix)", it will look like this "https://www.youtube.com/results?sea...e+age+of+love+(watch+out+for+stella+club+mix)". It would also be preferable (although not mandatory) if album names would also be clickable (and would direct me to their corresponding Discogs page).
To make my points easier to understand, here's a picture I made.


1664818450463.png


I tried to explain this as simply as I could, without going into any details on how and why it could help me, but the point is that automatizing certain things this way could actually cut my work time in half (or, to put it in another way, I could listen to twice as many tracks during the same amount of time as I can right now).
 
Aug 23, 2022
110 Posts
103 Thanked
This would be a very interesting project to do. Any reason you're requesting SQL specifically and not Excel + VBA?
I mean, I work exclusively in Google Sheets; I just saw that SQL was mentioned in a lot of places where people discussed the Discogs API and Data Dump.
 

Archon

Gagi
TranceFix Crew
Jun 27, 2020
3,912 Posts
2,808 Thanked
Ah yes, makes sense. Discogs has an API, I think it would work well.

SQL does make sense especially since you're gonna have tons of data. Even though it's only 1 table, there's gonna be an incredible amount of rows. Besides that, data can always be exported to CSV or XLS easily.
 
Aug 23, 2022
110 Posts
103 Thanked
Ah yes, makes sense. Discogs has an API, I think it would work well.

SQL does make sense especially since you're gonna have tons of data. Even though it's only 1 table, there's gonna be an incredible amount of rows. Besides that, data can always be exported to CSV or XLS easily.

This should be around 250K rows max (after deleting duplicates), based on my estimation. Yeah, that's still a lot of rows, but as far as I know, Google Sheets can handle up to 1 million rows, so that should be fine. It would be amazing if this script could be done somehow, as I would be able to work at least twice as fast. This way, I wouldn't have to type in every single track/artist name, click on individual albums, compare album contents and track names to each other, or check whether I checked an album or not after switching from one filter style (e.g., Trance) to another (e.g., Progressive Trance), etc.
 

Archon

Gagi
TranceFix Crew
Jun 27, 2020
3,912 Posts
2,808 Thanked
This should be around 250K rows max (after deleting duplicates), based on my estimation. Yeah, that's still a lot of rows, but as far as I know, Google Sheets can handle up to 1 million rows, so that should be fine. It would be amazing if this script could be done somehow, as I would be able to work at least twice as fast. This way, I wouldn't have to type in every single track/artist name, click on individual albums, compare album contents and track names to each other, or check whether I checked an album or not after switching from one filter style (e.g., Trance) to another (e.g., Progressive Trance), etc.
There's a couple of edge cases I envision though which might result in loss or excess of data, depending on the situation. But it is doable. I can't really commit to finishing it one of these days though, got busy times ahead, but maybe I can dedicate snippets of time to experimenting with it.

Just wondering, where would you store the SQL data? Is this the program you would run locally or? Would it be a one-time run or would you customize the parameters, split by year etc?
 
Aug 23, 2022
110 Posts
103 Thanked
There's a couple of edge cases I envision though which might result in loss or excess of data, depending on the situation. But it is doable. I can't really commit to finishing it one of these days though, got busy times ahead, but maybe I can dedicate snippets of time to experimenting with it.

Just wondering, where would you store the SQL data? Is this the program you would run locally or? Would it be a one-time run or would you customize the parameters, split by year etc?

Thank you in advance! This would be a one-time thing, and then I would do everything else needed in Google Sheets with the collected data (add personal notes to them, color code them, etc.). Naturally, I would have to do some cleanup manually to delete a lot of duplicates (this would be the only major and time-consuming thing I'd have to do, aside from actually listening to those tracks), as tons of tracks have been released with slightly different names (but they are still the same tracks).

Edit: just curious, what could result in loss or excess of data?
 

dmgtz96

Elite Member
Jul 13, 2020
2,640 Posts
1,499 Thanked
Thank you in advance! This would be a one-time thing, and then I would do everything else needed in Google Sheets with the collected data (add personal notes to them, color code them, etc.). Naturally, I would have to do some cleanup manually to delete a lot of duplicates (this would be the only major and time-consuming thing I'd have to do, aside from actually listening to those tracks), as tons of tracks have been released with slightly different names (but they are still the same tracks).

Edit: just curious, what could result in loss or excess of data?

Probably not what Gagi had in mind, but I imagine the Youtube links will eventually point to videos that have been taken down. My suggestion is to run a script that downloads all the videos once you only have unique videos.
 

Archon

Gagi
TranceFix Crew
Jun 27, 2020
3,912 Posts
2,808 Thanked
Probably not what Gagi had in mind, but I imagine the Youtube links will eventually point to videos that have been taken down. My suggestion is to run a script that downloads all the videos once you only have unique videos.
The links will lead to queries for YouTube videos. Also, downloading ~250k videos must take a shit ton of hard drive space.
Edit: just curious, what could result in loss or excess of data?
Tracks that don't have mix names in them. Can't remember if I've seen some releases where all mixes are listed without the mix name. Obscure vinyls from the 90s also have tracks that might be labelled incorrectly. That's for the loss. For the excess, just think of all the radio edits you might find. And imagine a vinyl which is listed as trance, but also as techno and ambient. How do you differ between trance, techno and ambient programmatically, without other metadata?
 
  • Thanks
Reactions: dmgtz96
Aug 23, 2022
110 Posts
103 Thanked
Tracks that don't have mix names in them. Can't remember if I've seen some releases where all mixes are listed without the mix name. Obscure vinyls from the 90s also have tracks that might be labelled incorrectly. That's for the loss.

So if I understand it correctly, if the script comes across a vinyl that has tracks on it without mix names like this one, it will simply ignore/delete that data? Why is that? I can't really understand. As far as I'm aware, mix names (e.g., Original Mix) are just simply part of the track name anyway and not handled individually by Discogs.

For the excess, just think of all the radio edits you might find. And imagine a vinyl which is listed as trance, but also as techno and ambient. How do you differ between trance, techno and ambient programmatically, without other metadata?

Well, I can't do much about it, but usually, it's pretty easy to spot after a few seconds of listening to certain points of a given track if it's not trance (or if it's a track that only has some trance elements). For example, before 1993, most tracks labeled as trance on Discogs had nothing to do with trance or were tracks that were 10% trance and 90% from another genre, which in a way, allowed me to go through those years quite quickly.

Furthermore, there's a reason why I only focus on 12" releases (along with a few key CD examples and radio shows, like the D.Trance series, Trance Europe Express, A State Of Trance, etc.). It was the main platform for producers in the classic era / for those who produced with old production techniques, so most noteworthy tracks are released on 12". And 12" records usually only focus on a singular track (with various mixes of it) or multiple tracks from the same artist, so there's a high chance in general that all those tracks will be from the same genre. However, CDs focus on many artists a lot of times, and it could easily happen that a CD which features 30 tracks and has a trance label on Discogs only has two trance songs on it in actuality, and the rest is techno, acid, etc., and I want to avoid that.
 
Aug 23, 2022
110 Posts
103 Thanked
Just a little update:

  1. I found this Python Discogs API Client that is still being developed. The latest version (v2.5) came out less than a day ago. Maybe it is useful?
  2. I installed SyncWith, which is an add-on for Google Sheets that lets me access thousands of APIs, including Discogs'. I thought maybe this could simplify the process somewhat as I work in Google Sheets anyway. The only problem is that it gives me an error when I try to log in to Discogs via the add-on. I have no idea what I'm doing wrong.
Edit: I also forgot to add to comment #6 that aside from album name, artist name, track name, and year, track length should also be listed.
 
Last edited:

drush

New Member
Oct 13, 2022
3 Posts
3 Thanked
Syncwith founder here, I just tried out querying discogs and it works, these steps should work:
  1. Request an access token here: https://www.discogs.com/settings/developers
  2. open up the syncwith addon, search for discogs
  3. discogs has a bunch of endpoints you could use the search one for instance - https://api.discogs.com/database/search
  4. add the query parameter token, enter your token
  5. enter a value for the q query parameter - this is your search, eg
  6. preview the connection
  7. setup pagination - use url based pagination where the next url is in the request response and set it to use pagination.urls.next - i set a limit of 10 pages as the discogs api is not very fast
That should be enough to get your working - if there were enough users interested we write a more thorough tutorial on syncwith.com.
1665674531023.png
 
  • Thanks
Reactions: TheTranceHistorian
Aug 23, 2022
110 Posts
103 Thanked
Syncwith founder here, I just tried out querying discogs and it works, these steps should work:

I never in a million years would have imagined that a founder of a large app like Syncwith would register to a site just to give me an answer. :D Thank you for your explanation; I'llmanaged to get the add-on working, although it displays things differently than on your picture.
1665678119820.png
 

drush

New Member
Oct 13, 2022
3 Posts
3 Thanked
I never in a million years would have imagined that a founder of a large app like Syncwith would register to a site just to give me an answer. :D Thank you for your explanation; I'llmanaged to get the add-on working, although it displays things differently than on your picture. View attachment 1117
ha, well we try, when there is nested json you need to hit the expand buttons to expand the json (basically the preview tool lets you turn json to tables without needing to understand jmespaths), all the pagination json you can delete - it's just useful to inspect to see where the next url is when setting up pagination. Hopefully that helps.

I did DJ drum and bass in the 90s, grew up on early trance and my grandparents are from the netharlands - although admittedly never been to trancefix.nl before but I do have a discogs account and will probably play with the api a bit and post a tutorial once I get some time.
 
  • Thanks
Reactions: TheTranceHistorian
Aug 23, 2022
110 Posts
103 Thanked
I did DJ drum and bass in the 90s, grew up on early trance and my grandparents are from the netharlands - although admittedly never been to trancefix.nl before but I do have a discogs account and will probably play with the api a bit and post a tutorial once I get some time.
Thank you, I would really appreciate that! Using the API could potentially cut my work time in half. Just out of curiosity, what are some of your favorite early trance tracks?