Hi, had some issues getting all the Ocean’s x films to scrape via MrMC using the TMDb scraper. I have 4 of them in total - Clooney’s 3 and the newest version “8” from last year. But only Thirteen scanned automatically.
Have had issues with these films in the past but always manually fixed it - I figured it was to do with numerical 11/12/13/8 vs the written form Eleven etc. (I matched the way they’re written on TVDb but no luck). It looks like it an issue with curly and straight apostrophes. And inconsistency on different apps/websites on the iPhone.
For example if I type an apostrophe in the Safari address bar - it’s straight. If I type the same key on the Duck Duck Go search engine it’s curly. If I type it in TMDb’s search bar it’s curly.
If you search for for “ocean’s” (curly apostrophe) on TMDb it returns 1 film. Ocean’s Thirteen:
https://www.themoviedb.org/search?query=Ocean’s&language=en-US
If you search with a straight apostrophe “Ocean's”, you get a 12 movies returned including all the Ocean’s films as well as Ocean’s Thirteen again:
https://www.themoviedb.org/search?query=Ocean%27s&language=en-US
I checked my local files. And Ocean’s Thirteen was the only version I had with straight apostrophes. Hence why it scanned ok. The rest were named with curly apostrophes. Renamed them all with straight apostrophes and now MrMC scrapes them fine.
Is this a bug of some sort? Can it be fixed so TMDb search returns results for either curly or straight apostrophes?
Can't find a movie or TV show? Login to create it.
Want to rate or add this item to a list?
Not a member?
Reply by genplant29
on February 22, 2019 at 10:38 PM
I've been noticing for some time that all instances of curly apostrophes on TMDb seem to be as copy-and-pasted in from texts found elsewhere. I'm not clear why they don't automatically convert to straight apostrophes (which are TMDb's norm) when pasted in and saved.
Reply by ticao2 🇧🇷 pt-BR
on February 23, 2019 at 9:04 AM
This damn apostrophe has made my research much more difficult.
What I found http://snowball.tartarus.org/texts/apostrophe.html
Those are 4:
U+0027 Unicode Character 'APOSTROPHE' https://www.fileformat.info/info/unicode/char/27/index.htm
U+2019 Unicode Character 'RIGHT SINGLE QUOTATION MARK' https://www.fileformat.info/info/unicode/char/2019/index.htm
U+2018 Unicode Character 'LEFT SINGLE QUOTATION MARK' https://www.fileformat.info/info/unicode/char/2018/index.htm
U+201B Unicode Character 'SINGLE HIGH-REVERSED-9 QUOTATION MARK' https://www.fileformat.info/info/unicode/char/201b/index.htm
And I think there are others.
U+2032 Unicode Character 'PRIME' https://www.fileformat.info/info/unicode/char/2032/index.htm
U+2035 Unicode Character 'REVERSED PRIME' https://www.fileformat.info/info/unicode/char/2035/index.htm
U+0060 Unicode Character 'GRAVE ACCENT' https://www.fileformat.info/info/unicode/char/0060/index.htm
and some more https://www.fileformat.info/info/unicode/block/combining_diacritical_marks/utf8test.htm
Reply by Travis Bell
on February 24, 2019 at 11:01 AM
I should be handling these properly, it's something that I remember looking at years and years ago. Here's the relevant ticket. Thanks.
Reply by Banana
on March 9, 2019 at 12:13 AM
@travisbell Is it okay to suggest new tickets? I don't know if you remember, but there are two somewhat similar search issues with a) all diacritics and b) the Turkish capital letter "İ" (regular i work fine, but the Turkish letter always returns no result).
Reply by Travis Bell
on September 6, 2019 at 12:08 PM
This issue has been fixed and pushed live. Curly and regular apostrophe's should be returning the same results now.
@banana_girl I haven't looked at the Turkish issue yet.