Welcome to the Invelos forums. Please read the forum rules before posting.

Read access to our public forums is open to everyone. To post messages, a free registration is required.

If you have an Invelos account, sign in to post.

    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1 2 3 4 5 6  Previous   Next
Parsing unknown names
Author Message
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,738
Posted:
PM this userDirect link to this postReply with quote
Let me tell you something else: back when the middle name field was introduced, I actually used what you're now proposing: I put everything between the first and last words of the name in the middle name field (articles excluded, of course). But the Helena Bonham Carter debate opened my eyes. Since we've all agreed to parse her as Helena/Bonham Carter, I've been applying that same kind of logic to lots of other names. As I said before: "correct" parsing is basically impossible to document, but if we parse Helena Bonham Carter as H/B C, then I can't help parsing, say, Viola Kates Stimpson as V/K S, too. It's just trying to be consistent... To me, and to lots of others, that is a logical extension of the decision we made on Helena Bonham Carter.

The result of this decision, however, is a mess in the online database. For each and every name consisting of more than two words, there are multiple entries. After all this time, there STILL are for Helena Bonham Carter. To solve this mess, we need a very clear solution - not something that still leaves room for (cultural) interpretation or anything else. Something that will get all users, throughout the various regions/localities, on the same page, with no room for any difference of opinion. Any system that will let you enter H/B C but not V/K S will never work perfectly, because once you're convinced Helena/Bonham Carter is "correct", you can't help but look differently at a name like Viola Kates Stimpson. The Bonham Carter debate learned us that not everything that's in the middle is necessarily a "middle name". And asking for documentation won't help, as proof of "correct" parsing can almost never be found.

Going back to a single name field would be the easiest solution - with an improved search feature, I see no disadvantage with this whatsoever. For anything else, we should take any flexibility and interpretation out of the process entirely. If we keep calling it "first name", "middle name" and "last name", people from various parts of the world will keep applying their own definition of these terms to it - like I do myself. If we want to take all interpretation and various cultural backgrounds out of it, we need to to let go of names, but just look at the words. If we just look at the words instead of interpreting them, and simply parse First Word/Middle Word(s)/Last Word, with no interpretation whatsoever, then we could all agree. It wouldn't be pretty, but it would work.
 Last edited: by T!M
DVD Profiler Unlimited RegistrantWhite Pongo, Jr.
No, I iz no Cheshire Cat!
Registered: August 22, 2007
Reputation: High Rating
Posts: 1,807
Posted:
PM this userDirect link to this postReply with quote
Quoting Gadgeteer:
Quote:
Are you suggesting that we check every cast member's nationality before we parse the name. It may be possible on the 'bigger' names but impossible for most others.

This won't work.


Well, you can default to the CoO of the movie.
-- Enry
DVD Profiler Desktop and Mobile RegistrantStar ContributorGadgeteer
Registered: March 13, 2007
United Kingdom Posts: 519
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting T!M:
Quote:
Let me tell you something else: some time back, I actually followed what you're now proposing: I put everything between the first and last words of the name in the middle name field (articles excluded, of course). But the Helena Bonham Carter debate opened my eyes. Since we've all agreed to parse her as Helena/Bonham Carter, I've been applying that same kind of logic to lots of other names. As I said before: "correct" parsing is basically impossible to document, but if we parse Helena Bonham Carter as H/B C, then I can't help parsing, say, Viola Kates Stimpson as V/K S, too. It's just trying to be consistent... To me, and to lots of others, that is a logical extension of the decision we made on Helena Bonham Carter.

The result of this decision, however, is a mess in the online database. For each and every name consisting of more than two words, there are multiple entries. After all this time, there STILL are for Helena Bonham Carter. To solve this mess, we need a very clear solution - not something that still leaves room for (cultural) interpretation or anything else. Something that will get all users, throughout the various regions/localities, on the same page, with no room for any difference of opinion. Any system that will let you enter H/B C but not V/K S will never work perfectly, because once you're convinced Helena/Bonham Carter is "correct", you can't help but look differently at a name like Viola Kates Stimpson. The Bonham Carter debate learned us that not everything that's in the middle is necessarily a "middle name". And asking for documentation won't help, as proof of "correct" parsing can almost never be found.

Going back to a simple name field would be the easiest solution - with an improved search feature, I see no disadvantage with this whatsoever. For anything else, we should take any flexibility and interpretation out of the process entirely. So yes, agreeing to leave any interpretation out of it, and just parse on what you see - First Word/Middle Word(s)/Last Word - would work. As long as we're still hung up on what part of the name we see before us is either the first, middle or last name, there will always be people applying their own definition of those terms to it - like I would do myself. If we only look at the words and simply parse First Word/Middle Word(s)/Last Word, with no interpretation whatsoever, then we could all agree. It wouldn't be pretty, but it would work.


Personally I agree with you and would rather see a very simplistic parsing with no exceptions. However, I don't think you'll get agreement across the board.

I would like to hear reasons as to why it's useful to have H//B C instead of H/B/C? Other than to be culturally correct.
Is it important to have the names in the'correct' field?
Stuart
 Last edited: by Gadgeteer
DVD Profiler Desktop and Mobile RegistrantStar ContributorGadgeteer
Registered: March 13, 2007
United Kingdom Posts: 519
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting EnryWiki:
Quote:
Quoting Gadgeteer:
Quote:
Are you suggesting that we check every cast member's nationality before we parse the name. It may be possible on the 'bigger' names but impossible for most others.

This won't work.


Well, you can default to the CoO of the movie.


How would that work with a multinational cast list?
What about an actor that appears in films from various CoOs?
Stuart
 Last edited: by Gadgeteer
DVD Profiler Unlimited RegistrantStar ContributorWinston Smith
Don't be discommodious
Registered: March 13, 2007
United States Posts: 21,610
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
I REALLY do believe that none of the arguments I have seen make any rational sense, it truly does seem to come down to being lazy and not wanting to research and document the way a user wants it. I have seen this from numerous users, which use very lame a general documentation for a number of things. The answer seems to revolve around "if the user can put up a big enough stink about it" and get his way" he has saved himself work because he got his way and doesn't have to document that, it makes no difference whether his way is correct or even accurate. I hope I am wrong, but I see the attitude being shown in so many of these threads being carried over into contributions its the only conclusion that makes sense.

And in that regard you will get no sympathy from me, do the work and provide the documentation or don't make the changes.

Skip
ASSUME NOTHING!!!!!!
CBE, MBE, MoA and proud of it.
Outta here

Billy Video
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,738
Posted:
PM this userDirect link to this postReply with quote
Quoting Gadgeteer:
Quote:
Personally I agree with you and would rather see a very simplistic parsing with no exceptions. However, I don't think you'll get agreement across the board.

Glad to hear that. I agree that some of the more vocal users will probably protest loudly. While in fact, such a change would lead to less disagreements than the current situation does. Now me and Skip probably disagree about 200 names, of which probably less than five can be documented in any way. A change like this would result Skip in having to let the "culturally correct" parsing of these five go, while we would then suddeny agree on the other 195 names. All in all, that seems like a good result for everybody. So I don't really understand the resistance. Let's just hope that Ken makes a wise decision...

Quote:
I would like to hear reasons as to why it's useful to have H//B C instead of H/B/C? Other than to be culturally correct.

I don't think there are. As I said before: the Helena Bonham Carter debate is the single cause for my change in handling three-piece names. Before Helena Bonham Carter, I parsed them all alike. Now I try to assess for every name whether the middle word is either part of the last name or part of the middle name. Documentation on parsing is basically non-existant, so you usually have to decide by yourself. That's the whole story: with Helena Bonham Carter we opened a can of worms that cannot be easily closed. If we do want to end the confusion, the solution had better not leave anything open to interpretation - culturally correct or not.

Edit: to Skip, once more, documentation on what is (culturally) "correct" parsing is basically non-existant. Sure, someone managed to dig up a complete family history on Helena Bonham Carter, but I assure you this is an exception. Can you show me a dozen three-piece names that have been properly documented? For argument's sake, let's say there are indeed a dozen. Wouldn't you say that parsing those few "wrongly" would be a small price to pay to get agreement all accross the board on ALL hundreds, maybe thousands of other names consisting of more than two words?
 Last edited: by T!M
DVD Profiler Unlimited RegistrantWhite Pongo, Jr.
No, I iz no Cheshire Cat!
Registered: August 22, 2007
Reputation: High Rating
Posts: 1,807
Posted:
PM this userDirect link to this postReply with quote
Quoting T!M:
Quote:
Let me tell you something else: some time back, I actually followed what you're now proposing: I put everything between the first and last words of the name in the middle name field (articles excluded, of course). But the Helena Bonham Carter debate opened my eyes. Since we've all agreed to parse her as Helena/Bonham Carter, I've been applying that same kind of logic to lots of other names. As I said before: "correct" parsing is basically impossible to document, but if we parse Helena Bonham Carter as H/B C, then I can't help parsing, say, Viola Kates Stimpson as V/K S, too. It's just trying to be consistent... To me, and to lots of others, that is a logical extension of the decision we made on Helena Bonham Carter.

The result of this decision, however, is a mess in the online database. For each and every name consisting of more than two words, there are multiple entries. After all this time, there STILL are for Helena Bonham Carter. To solve this mess, we need a very clear solution - not something that still leaves room for (cultural) interpretation or anything else. Something that will get all users, throughout the various regions/localities, on the same page, with no room for any personal preference or interpretation. Any system that will let you enter H/B C but not V/K S will never work perfectly, 'cause once you're convinced H/B C is "correct", you can't help but look differently at a name like V K S. The Bonham Carter debate learned us that not everything that's in the middle is necessarily a "middle name". And asking for documentation won't help, as proof of "correct" parsing can almost never be found.


You are probably right, except for one thing: using the expression "Middle Name" *is* a  cultural interpretion of names.
You say "no room for any personal preference or interpretation". But, calling those words "Middle Name" is just a matter of personal (national) preference.

Having said that, actually I don't care much of the principle, as long as a rule works.
But it won't work.
No matter what you write in the Rules, many users won't stick to a rule that asks to call a name a Middle Name even if it's not


Quote:

Going back to a simple name field would be the easiest solution - with an improved search feature, I see no disadvantage with this whatsoever.


That would be the easiest solution: just one input field where you can write the complete name.
-- Enry
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,738
Posted:
PM this userDirect link to this postReply with quote
Quoting EnryWiki:
Quote:
You are probably right, except for one thing: using the expression "Middle Name" *is* a  cultural interpretion of names.
You say "no room for any personal preference or interpretation". But, calling those words "Middle Name" is just a matter of personal (national) preference.

That's what I've been saying all along: that we need to stop calling it a middle name. You're absolutely right: as long as we call it that, people from various parts of the world are going to interpret its intended use differently.
 Last edited: by T!M
DVD Profiler Desktop and Mobile RegistrantStar ContributorBad Father
Registered: July 23, 2001
Registered: March 13, 2007
Posts: 4,596
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting T!M:
Quote:
That's what I've been saying all along: that we need to stop calling it a middle name. You're absolutely right: as long as we call it that, people from various parts of the world are going to interpret its intended use differently.


I believe you meant to highlight middle name . Afterall, they are names we are referring to right?
My WebGenDVD online Collection
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,738
Posted:
PM this userDirect link to this postReply with quote
Quoting 8ballMax:
Quote:
Quoting T!M:
Quote:
That's what I've been saying all along: that we need to stop calling it a middle name. You're absolutely right: as long as we call it that, people from various parts of the world are going to interpret its intended use differently.


I believe you meant to highlight middle name . Afterall, they are names we are referring to right?

Read the preceding four pages, and you'll see that I wasn't mistaken. The fact that the field are called names, like "middle name" is the main cause that different people parse different names in different ways. For instance, in my part of the world, a "middle name" is like a second given name. In the example of the poll of this thread: "Henry" could be considered a middle name. The "Penrose" part would be considered part of his last name by millions and millions of people over here. So yes, the fact that they're called "names" is a big part of the problem. If we were just told to enter the first word of what you see in field A, the last word of what you see into field C, and everything inbetween in field B, then we wouldn't be having these debates. Instead, they're called names, and different users from different regions/localities interpret its intended use differently.

Interesting reading:
- http://en.wikipedia.org/wiki/Middle_name
 Last edited: by T!M
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting Gadgeteer:
Quote:
How about we suggest renaming it to simply:

First/Middle/Last

this along with something in the rules on default parsing positions should suffice.

We've had that in 2.4. Didn't work either because a lot of people (including myself) are not willing to split a name by words only into 3 fields. But back then a lot of people have voiced the suggestion to get rid of the multi field names in favour to a single field. This would avoid all parsing problems.
DVD Profiler Unlimited RegistrantWhite Pongo, Jr.
No, I iz no Cheshire Cat!
Registered: August 22, 2007
Reputation: High Rating
Posts: 1,807
Posted:
PM this userDirect link to this postReply with quote
Quoting Gadgeteer:
Quote:

How would that work with a multinational cast list?
What about an actor that appears in films from various CoOs?


In that case, you would have to deal with each name individually, look it up, or just guess! Not an error-proof system, I know! 

If you want an easy, standardized system, rename the three fields in a neutral  way, that won't cause "cultural" confusion. Or just use one field.
-- Enry
DVD Profiler Unlimited RegistrantStar ContributorDarxon
Vescere bracis meis
Registered: March 14, 2007
Germany Posts: 742
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
@Skip

I dunno, but didn't you always proclaim the parsing intended to be according to counting words unless documentation proves otherwise? If I do recall your stance correctly, there are in fact several users that expressed exactly that approach in this thread, so your general discard of all arguments in this thread as senseless is plain wrong.

And this thread's concern aren't any specific changes and the lack of documentation but the general approach to parsing names and the way it should be set in the rules to minimize or even avoid further misconceptions and discussions. A task we've tried several times already, but to no avail.

As you might recall, it was pointed out to you several times that the current rules do not provide ANYTHING on this subject, but definitely should do so.

And btw, I think we should stop blaming cultural differences in naming conventions for the problems we have here, and especially should refrain from belitteling other users because of their respective native backgrund. This sort of behavior won't support the discussion in any way and help find a solution but only leads to the development of harsh and stubborn attitudes, effectively eliminating the chances for a sensible discussion.

This is not about self-proclaimed naming / parsing experts and applying one's own cultural background to actor's names of different origin, as much as some like to make believe to steer the discussion towards animosity. BTW, those do exist on both sides of the pond, and they all have made themselves read and heard in these forums in the past.

@8ballMax

Nope, highlighting "name" is correct. The point of his post that referred to my earlier posting is to abandon the term "name" for the sake of easier entry of data. Read what I wrote and I think the general conceptual idea is obvious.
Lutz
 Last edited: by Darxon
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting Gadgeteer:
Quote:
Is it important to have the names in the'correct' field?

Why would you want different fields at all if you are willing to parse the name by a simple word rule?
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting skipnet50:
Quote:
... it truly does seem to come down to being lazy and not wanting to research and document the way a user wants it. ...

It truly does seem to come down to being lazy and only wanting to count words instead of trying to parse the name properly.
DVD Profiler Desktop and Mobile RegistrantStar ContributorGadgeteer
Registered: March 13, 2007
United Kingdom Posts: 519
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting RHo:
Quote:
Quoting Gadgeteer:
Quote:
Is it important to have the names in the'correct' field?

Why would you want different fields at all if you are willing to parse the name by a simple word rule?


I don't. I could happily live with the 1 field.
Stuart
    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1 2 3 4 5 6  Previous   Next