-
09-10-2018, 11:50 AM
#3781
Originally Posted by myles
data set is really a snapshot in time - as a whole, it can't be built on because it will potentially contain 'stale' loan data. To try to explain: If someone uploads a particular loan that no one else has in the initial data set, and then never uploads again, then that particular loan will just sit with no further detail updated - it will become 'stale' and impact on the data set.
The dates stored with the data i.e. LAST_PAYMENT_DATE, don't allow, with confidence, for these 'stale' records to be removed (they may just be in arrears). ...
I agree. Myles.
Humvee, I think that while more data and more people sharing data is definitely better, Myles probably already has over 75% of the unique loan records in the Harmoney population with his own dataset, yours, mine and InTheRearWithTheGear's. If Cool Bear chooses to share too, I suspect we might end up with over 80% n. No statistician can argue purity of data on those numbers.
Originally Posted by myles
the detail of loans that are 'Paid Off', 'Charged Off', 'Debt Sold', will not change and could be built on..
Also agree. These don't become stale - as a final outcome has been already achieved. The only place this gets stuffed is where Harmoney has noted different outcomes in different portfolios for the same LAI. I came across this when we were pooling some Harmoney data earlier.
-
09-10-2018, 11:58 AM
#3782
yeah, nah
Originally Posted by beacon
Also agree. These don't become stale - as a final outcome has been already achieved. The only place this gets stuffed is where Harmoney has noted different outcomes in different portfolios for the same LAI. I came across this when we were pooling some Harmoney data earlier.
Yep, will just have to 'fix' data errors that are obvious and hope they all come out in the wash...
The 'jump' in defaults that showed up in that last time lapse chart I did was surprising to me - I would not likely have found it if I hadn't done that time lapse... It appears harmoney updated the LAST_PAYMENT_DATE field when the debt was sold which broke the data, why they did that I don't know. I would have liked to have developed a 'hazard' curve based on real data, but can't because of this
-
09-10-2018, 12:19 PM
#3783
yeah, nah
Is there any additional data that anyone can think of that might be of value when pulling this together?
An example is that I could create a 'number of lenders' column, which would be a count of how many of the data sets included the loan - not overly meaningful but might be interesting to see which loans are taken more frequently? Hmm, might be easier to just release the 'raw' data and the 'clean/unique' data sets. Then anyone can do their own thing.
I'll just keep the main merge simple by taking the latest unique loan record as uploaded from the data sets - I'll add mine as a fresh set at the end.
Any suggestions or thoughts on this welcome.
-
09-10-2018, 12:47 PM
#3784
Originally Posted by myles
... It appears harmoney updated the LAST_PAYMENT_DATE field when the debt was sold which broke the data, why they did that I don't know. I would have liked to have developed a 'hazard' curve based on real data, but can't because of this
The least Harmoney can do is update the hazard curve annually, especially as it is still evolving. It still has the 15 month old one up (hazard-curve-jul-2017-583x260)
-
09-10-2018, 12:48 PM
#3785
Member
Originally Posted by myles
Is there any additional data that anyone can think of that might be of value when pulling this together?
An example is that I could create a 'number of lenders' column, which would be a count of how many of the data sets included the loan - not overly meaningful but might be interesting to see which loans are taken more frequently? Hmm, might be easier to just release the 'raw' data and the 'clean/unique' data sets. Then anyone can do their own thing.
I'll just keep the main merge simple by taking the latest unique loan record as uploaded from the data sets - I'll add mine as a fresh set at the end.
Any suggestions or thoughts on this welcome.
I think your proposal to clean the data, then release a 'tidy' version of unique loans the best option. Often times with data, you don't develop your question until mucking around with the data.
I think keeping the 'LAI-' as a key is useful, I'd prefer that over InTheRearWithTheGear's suggestion.
-
09-10-2018, 12:53 PM
#3786
Originally Posted by alundracloud
I think keeping the 'LAI-' as a key is useful, I'd prefer that over InTheRearWithTheGear's suggestion.
My vote for keeping LAI too. It is easier, cleaner, portable, scalable, discrete ...
-
09-10-2018, 12:56 PM
#3787
Member
Add a record creation date/time - could be used as a tie breaker for determining freashness for duplicate rows.
-
09-10-2018, 01:17 PM
#3788
Originally Posted by RMJH
I have about 2500 active loans ... With 50% annual churn/repayment ...!
Data pool would benefit from contributions by early lenders like RMJH, 777, harvey specter, or Halebop, if they are still around.
-
09-10-2018, 01:36 PM
#3789
Member
Originally Posted by beacon
My vote for keeping LAI too. It is easier, cleaner, portable, scalable, discrete ...
agreed too
-
09-10-2018, 01:53 PM
#3790
Member
I find that when analysing defaults, it is better to leave out all recent loans as they will lower the actual default rates. Based on my loans, the average time from date of loan to default is just over 13 months. The median is just short of 12 months. So my suggestion to Myles is to (in addition to the main set of graphs), also do one set (relating to defaults) ignoring all loans less than (say) 12 months old.
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|
Bookmarks