The situation was that the original blog had language variants in the form of http://blog.company.com/2017/... (without any language specified) for English and http://blog.company.com/ru/2017/... for other languages (in this case, "ru" for Russian).
Therefore, we process English blog items separately:
{ xmlstarlet ed -d '//item[not(contains(link,"blog.avast.com/20"))]' all.xml; echo ""; } > all-en.xml
Other languages are processed automatically with the help of a loop:
for L in cs de es fr it ja pl pt-br ru tr uk ; do export F=all-$L.xml; { xmlstarlet ed -d '//item[not(contains(link,"blog.avast.com/'$L'/"))]' all.xml; echo ""; } > "$F"; sed -r -i "s:(<link.*blog.avast.com/)$L/:\1:g" $F; done
There are redirects to be set up in the HubSpot settings so that the old URLs are still accessible and they will point to existing articles (new locations). This helps to maintain ranking in search engines and does not break links from other sites.
# english without a cycle
L="" perl -ne 'if(/^http:\/\/blog.avast.com(.*),http:\/\/avast.hs-sites.com(.*)$/ && not ($1 eq $2)) { print "$1,$2\n"}' > redirects-languages.txt
# other languages
for L in de it fr ru pl pt-br cs es; do
perl -ne 'if(/^http:\/\/blog.avast.com(.*),http:\/\/avast.hs-sites.com(.*)$/ && not ("/'$L'$1" eq $2)) { print "/'$L'$1,$2\n"}' >> redirects-languages.txt
done
One may have to deal with CDATA:
CDATA tag replace in ViM:
:%s#\(<script.*\)// <!\[CDATA\[#\1#
:%s#// \]\]\]\]><!\[CDATA\[></script>#</script>#
When uploading new redirects (which is possible to do in bulk), we deleted the old redirects first. It is possible to automate it with an iMacros script:
VERSION BUILD=8940826 RECORDER=FX
TAB T=1
URL GOTO=https://app.hubspot.com/content/486579/settings/url-mappings
TAG POS=1 TYPE=SPAN ATTR=CLASS:dropdown-targetsettings-icon&&TXT:
TAG POS=11 TYPE=A ATTR=TXT:Delete
TAG POS=1 TYPE=A ATTR=ID:hs-fancybox-ok
TAB T=1
URL GOTO=https://app.hubspot.com/content/486579/settings/url-mappings
TAG POS=1 TYPE=SPAN ATTR=CLASS:dropdown-target
TAG POS=11 TYPE=A ATTR=TXT:Delete
TAG POS=1 TYPE=A ATTR=ID:hs-fancybox-ok
With these commands, it was possible to migrate thousands of blog posts from WordPress into HubSpot.