Harmoney

**beacon** · 09-10-2018, 11:50 AM

Originally Posted by myles

data set is really a snapshot in time - as a whole, it can't be built on because it will potentially contain 'stale' loan data. To try to explain: If someone uploads a particular loan that no one else has in the initial data set, and then never uploads again, then that particular loan will just sit with no further detail updated - it will become 'stale' and impact on the data set.

The dates stored with the data i.e. LAST_PAYMENT_DATE, don't allow, with confidence, for these 'stale' records to be removed (they may just be in arrears). ...

I agree. Myles.
Humvee, I think that while more data and more people sharing data is definitely better, Myles probably already has over 75% of the unique loan records in the Harmoney population with his own dataset, yours, mine and InTheRearWithTheGear's. If Cool Bear chooses to share too, I suspect we might end up with over 80% n. No statistician can argue purity of data on those numbers.

Originally Posted by myles

the detail of loans that are 'Paid Off', 'Charged Off', 'Debt Sold', will not change and could be built on..

Also agree. These don't become stale - as a final outcome has been already achieved. The only place this gets stuffed is where Harmoney has noted different outcomes in different portfolios for the same LAI. I came across this when we were pooling some Harmoney data earlier.

**myles** · 09-10-2018, 11:58 AM

Originally Posted by beacon

Also agree. These don't become stale - as a final outcome has been already achieved. The only place this gets stuffed is where Harmoney has noted different outcomes in different portfolios for the same LAI. I came across this when we were pooling some Harmoney data earlier.

Yep, will just have to 'fix' data errors that are obvious and hope they all come out in the wash...

The 'jump' in defaults that showed up in that last time lapse chart I did was surprising to me - I would not likely have found it if I hadn't done that time lapse... It appears harmoney updated the LAST_PAYMENT_DATE field when the debt was sold which broke the data, why they did that I don't know. I would have liked to have developed a 'hazard' curve based on real data, but can't because of this

**myles** · 09-10-2018, 12:19 PM

Is there any additional data that anyone can think of that might be of value when pulling this together?

An example is that I could create a 'number of lenders' column, which would be a count of how many of the data sets included the loan - not overly meaningful but might be interesting to see which loans are taken more frequently? Hmm, might be easier to just release the 'raw' data and the 'clean/unique' data sets. Then anyone can do their own thing.

I'll just keep the main merge simple by taking the latest unique loan record as uploaded from the data sets - I'll add mine as a fresh set at the end.

Any suggestions or thoughts on this welcome.

**beacon** · 09-10-2018, 12:47 PM

Originally Posted by myles

... It appears harmoney updated the LAST_PAYMENT_DATE field when the debt was sold which broke the data, why they did that I don't know. I would have liked to have developed a 'hazard' curve based on real data, but can't because of this

The least Harmoney can do is update the hazard curve annually, especially as it is still evolving. It still has the 15 month old one up (hazard-curve-jul-2017-583x260)

**alundracloud** · 09-10-2018, 12:48 PM

Originally Posted by myles

Is there any additional data that anyone can think of that might be of value when pulling this together?

An example is that I could create a 'number of lenders' column, which would be a count of how many of the data sets included the loan - not overly meaningful but might be interesting to see which loans are taken more frequently? Hmm, might be easier to just release the 'raw' data and the 'clean/unique' data sets. Then anyone can do their own thing.

I'll just keep the main merge simple by taking the latest unique loan record as uploaded from the data sets - I'll add mine as a fresh set at the end.

Any suggestions or thoughts on this welcome.

I think your proposal to clean the data, then release a 'tidy' version of unique loans the best option. Often times with data, you don't develop your question until mucking around with the data.

I think keeping the 'LAI-' as a key is useful, I'd prefer that over InTheRearWithTheGear's suggestion.

**beacon** · 09-10-2018, 12:53 PM

Originally Posted by alundracloud

I think keeping the 'LAI-' as a key is useful, I'd prefer that over InTheRearWithTheGear's suggestion.

My vote for keeping LAI too. It is easier, cleaner, portable, scalable, discrete ...

**IntheRearWithTheGear** · 09-10-2018, 12:56 PM

Add a record creation date/time - could be used as a tie breaker for determining freashness for duplicate rows.

**beacon** · 09-10-2018, 01:17 PM

Originally Posted by RMJH

I have about 2500 active loans ... With 50% annual churn/repayment ...!

Data pool would benefit from contributions by early lenders like RMJH, 777, harvey specter, or Halebop, if they are still around.

**Cool Bear** · 09-10-2018, 01:36 PM

Originally Posted by beacon

My vote for keeping LAI too. It is easier, cleaner, portable, scalable, discrete ...

agreed too

**Cool Bear** · 09-10-2018, 01:53 PM

I find that when analysing defaults, it is better to leave out all recent loans as they will lower the actual default rates. Based on my loans, the average time from date of loan to default is just over 13 months. The median is just short of 12 months. So my suggestion to Myles is to (in addition to the main set of graphs), also do one set (relating to defaults) ignoring all loans less than (say) 12 months old.

Thread: Harmoney

Thread Tools

Display

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions