Sunday, February 19, 2012

Identifying unused columns in multiple databases

Hi - we have a situation where we have 23 identical databases on
servers across the country. We know that there are fields within this
db that no one is using and would like to be able to identify all of
them so they can be removed across the board. (There are multiple
tables and columns) We would of course have to verify that none of the
23 offices are using a particular field before it's removed as the db's
need to stay identical. (data in each db varies).
Anyway - I came across this posting which sounds like what we're
looking for, but it specifically references varchar and nvarchar fields
- I can't figure out what to modify in it to make this work for any
type of field. Can anyone point me in the right direction?
This is the posting:
> How can I find all the varchar columns in a table?

> I'm looking for an existing empty or null column rather than add a column.
This will generate a script for the current database that will tell you
all
of the varchar or nvarchar columns, and what percentage of them are
"unused" -- either NULL or empty strings.
SELECT 'SELECT '''+TABLE_NAME+''','''+COLUMN_NAME+''','
''
+DATA_TYPE+'('+RTRIM(CHARACTER_MAXIMUM_L
ENGTH)
+')'',''empty'',COUNT(*) FROM '+TABLE_SCHEMA+'.'+TABLE_NAME
+' WITH (NOLOCK) WHERE
LTRIM(RTRIM(COALESCE('+COLUMN_NAME+','''
')))=''''
UNION
SELECT '''+TABLE_NAME+''','''+COLUMN_NAME+''','
''
+DATA_TYPE+'('+RTRIM(CHARACTER_MAXIMUM_L
ENGTH)
+')'',''total'',COUNT(*) FROM '+TABLE_SCHEMA+'.'+TABLE_NAME
+' WITH (NOLOCK)'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE DATA_TYPE LIKE '%varchar'
You can run this in Query Analyzer, and it will generate a script in
the
bottom pane. Copy to a new query window and let 'er rip. Should take
a
while on larger databases. I use WITH (NOLOCK) for this kind of task
to
avoid concurrency and blocking issues; if you need an
up-to-the-millisecond
guaranteed read on the whole table, you may opt to leave that out (but
I do
not recommend it on a production server).
Thank you in advance.Please do not take offense, but if are not familiar enough with SQL Server
and SQL in general to make the necessary change to this query, then you
really are not qualified to be working on the changes you are talking about.
I am not trying to be mean, but your question is much simpler than the task
you intend to eventually perform. If you need help at this point, you will
get disastrous results later on. It is best to get someone who is more of
an expert in such things to handle this project.
"Debbie" <deborah.young@.mail.va.gov> wrote in message
news:1140104286.351275.153030@.g43g2000cwa.googlegroups.com...
> Hi - we have a situation where we have 23 identical databases on
> servers across the country. We know that there are fields within this
> db that no one is using and would like to be able to identify all of
> them so they can be removed across the board. (There are multiple
> tables and columns) We would of course have to verify that none of the
> 23 offices are using a particular field before it's removed as the db's
> need to stay identical. (data in each db varies).
> Anyway - I came across this posting which sounds like what we're
> looking for, but it specifically references varchar and nvarchar fields
> - I can't figure out what to modify in it to make this work for any
> type of field. Can anyone point me in the right direction?
> This is the posting:
>
column.
>
> This will generate a script for the current database that will tell you
> all
> of the varchar or nvarchar columns, and what percentage of them are
> "unused" -- either NULL or empty strings.
> SELECT 'SELECT '''+TABLE_NAME+''','''+COLUMN_NAME+''','
''
> +DATA_TYPE+'('+RTRIM(CHARACTER_MAXIMUM_L
ENGTH)
> +')'',''empty'',COUNT(*) FROM '+TABLE_SCHEMA+'.'+TABLE_NAME
> +' WITH (NOLOCK) WHERE
> LTRIM(RTRIM(COALESCE('+COLUMN_NAME+','''
')))=''''
> UNION
> SELECT '''+TABLE_NAME+''','''+COLUMN_NAME+''','
''
> +DATA_TYPE+'('+RTRIM(CHARACTER_MAXIMUM_L
ENGTH)
> +')'',''total'',COUNT(*) FROM '+TABLE_SCHEMA+'.'+TABLE_NAME
> +' WITH (NOLOCK)'
> FROM INFORMATION_SCHEMA.COLUMNS
> WHERE DATA_TYPE LIKE '%varchar'
>
> You can run this in Query Analyzer, and it will generate a script in
> the
> bottom pane. Copy to a new query window and let 'er rip. Should take
> a
> while on larger databases. I use WITH (NOLOCK) for this kind of task
> to
> avoid concurrency and blocking issues; if you need an
> up-to-the-millisecond
> guaranteed read on the whole table, you may opt to leave that out (but
> I do
> not recommend it on a production server).
>
> Thank you in advance.
>|||By "using the column", do you mean:
#1 if the column is populated with at least some data
#2 if there are any queries or procedures that actually reference the
column
The programming below seems to be querying for any columns that are all NULL
or zero length.
Also, a column may have been populated at some point in the past, but is
currently obsolete and not used. If you really need to audit which tables /
columns are in use by the application during normal usage, then you can
setup an object trace in SQL Server Profiler and let it run the background
for a complete business cycle.
http://msdn.microsoft.com/library/d...
ethowto15.asp
"Debbie" <deborah.young@.mail.va.gov> wrote in message
news:1140104286.351275.153030@.g43g2000cwa.googlegroups.com...
> Hi - we have a situation where we have 23 identical databases on
> servers across the country. We know that there are fields within this
> db that no one is using and would like to be able to identify all of
> them so they can be removed across the board. (There are multiple
> tables and columns) We would of course have to verify that none of the
> 23 offices are using a particular field before it's removed as the db's
> need to stay identical. (data in each db varies).
> Anyway - I came across this posting which sounds like what we're
> looking for, but it specifically references varchar and nvarchar fields
> - I can't figure out what to modify in it to make this work for any
> type of field. Can anyone point me in the right direction?
> This is the posting:
>
>
> This will generate a script for the current database that will tell you
> all
> of the varchar or nvarchar columns, and what percentage of them are
> "unused" -- either NULL or empty strings.
> SELECT 'SELECT '''+TABLE_NAME+''','''+COLUMN_NAME+''','
''
> +DATA_TYPE+'('+RTRIM(CHARACTER_MAXIMUM_L
ENGTH)
> +')'',''empty'',COUNT(*) FROM '+TABLE_SCHEMA+'.'+TABLE_NAME
> +' WITH (NOLOCK) WHERE
> LTRIM(RTRIM(COALESCE('+COLUMN_NAME+','''
')))=''''
> UNION
> SELECT '''+TABLE_NAME+''','''+COLUMN_NAME+''','
''
> +DATA_TYPE+'('+RTRIM(CHARACTER_MAXIMUM_L
ENGTH)
> +')'',''total'',COUNT(*) FROM '+TABLE_SCHEMA+'.'+TABLE_NAME
> +' WITH (NOLOCK)'
> FROM INFORMATION_SCHEMA.COLUMNS
> WHERE DATA_TYPE LIKE '%varchar'
>
> You can run this in Query Analyzer, and it will generate a script in
> the
> bottom pane. Copy to a new query window and let 'er rip. Should take
> a
> while on larger databases. I use WITH (NOLOCK) for this kind of task
> to
> avoid concurrency and blocking issues; if you need an
> up-to-the-millisecond
> guaranteed read on the whole table, you may opt to leave that out (but
> I do
> not recommend it on a production server).
>
> Thank you in advance.
>|||I'd suggest running the trace (after it has been tuned to only trace relevan
t
events and return relevant data) for a month (or whatever the actual
turn-over time is).
After that you need to analyze the trace data, identifying individual unused
objects.
The following activities should not be done in live production.
Unused objects should first be renamed before actually being dropped -
rename one at a time to get a clear picture of their actual usage.
After the system has been running successfully for your specific turn-over
time with the renamed objects, you could consider dropping them - in a test
environment first, of course.
This is the bottom-to-top approach.
There is, however, another approach to this - analyze your actual business
requirements. If you think you have unused columns, maybe the entire data
model is wrong for you, and needs to be redesigned. This approach may even
yield better results.
ML
http://milambda.blogspot.com/

No comments:

Post a Comment